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Final  Report:  Random  Number  Generation  for  High  Performance  Computing 

ABSTRACT 

The  primary  objectives  of  the  Phase  II  of  the  project  are:  (a)  implement  the  context-aware  parallel  random  number  generator  (CPRNG), 
developed  in  Phase  I  of  this  project,  (b)  implement  the  interstream  correlation  (ISC)  test  so  that  the  quality  of  the  random  numbers  (RNs) 
used  by  applications  are  evaluated  and  quality  metrics  are  reported  on  demand.  Both  objectives  have  been  accomplished. 

Beyond  these  objectives,  additional  design  and  implementation  contributions  have  been  accomplished.  A  flexible  CPRNG-ISC  Test  (CIT) 
framework  was  developed  and  implemented  so  that  a  third  party  tester  such  as  Dieharder  or  TestUOl  can  be  run  along  with  ISC  test  to 
corroborate  or  compare  ISC  test  results  with  those  from  the  well-known  single-stream  test  batteries.  The  CPRNG  Library  facilitates 
implementation  and  use  of  other  random  number  generators  within  the  test  framework  easily. 

To  demonstrate  the  flexibility  of  the  CIT  framework,  we  implemented  the  MLFG  generator  from  SPRNG  package  together  with  a  number  of 
other  generators,  some  of  which  have  become  available  since  the  beginning  of  this  Phase  II  project. 

Three  versions  of  CPRNG  were  implemented:  CPU-based  context-free  generator,  a  CPU-based  context-aware  generator,  and  GPU-based 
context-free  generator. 
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1.  Accomplishments 


A  flexible  CPRNG-ISC  Test  (CIT)  framework  was  developed  and  implemented  so  that  a  third  party  tester  such  as  Dieharder  or 
TestUOl  can  be  run  along  with  the  ISC  test  to  corroborate  or  compare  ISC  test  results  with  those  from  the  well-known  single¬ 
stream  test  batteries. 

The  CPRNG  Library  is  implemented  in  a  flexible  manner  to  facilitate  implementation  and  use  of  other  random  number 
generators  within  the  test  framework  easily  (see  Figure  1  in  the  attached  document). 

To  demonstrate  the  flexibility  of  the  CIT  framework,  we  implemented  the  MLFG  generator  from  SPRNG  package  [7],  drand48 — 
available  on  standard  Unix/Linux  systems,  and  a  parallel  RNG  based  on  cryptographic  operations  from  the  family  of  generators 
proposed  by  D.E.  Shaw  Group  [12],  and  a  pathological  linear  congruence  generator  (pLCG).  In  addition,  we  implemented  within 
CPRNG  Library  to  provide  access  to  Intel’s  new  digital  random  number  generator  (DRNG)  and  Nvidia’s  GPU-based  generator 
MTGP32  [6],  when  the  host  system  has  the  necessary  hardware — newer  processor  chips  or  GPUs,  respectively — to  support 
these  generators. 

Three  versions  of  CPRNG  were  implemented:  CPU-based  context-free  generator,  which  is  used  to  report  results  in  this  report, 
CPU-based  context-aware  generator,  and  GPU-based  context-free  generator.  A  context-aware  generator  automatically,  without 
any  changes  to  the  application  code,  uses  distinct  RN  streams  when  the  application  requests  for  RNs  from  a  stream  from 
different  program  contexts. 

2.  Performance  Analysis 

CPRNG,  the  new  parallel  random  generator  developed  in  Phase  I  of  this  project,  was  implemented  in  the  SPRNG  package  in 
Phase  I.  In  Phase  II,  it  was  implemented  as  a  standalone  library  package  with  a  simple  application  programming  interface.  The 
results  given  in  Figure  2  of  the  attached  document  indicate  that  the  time  to  initialize  a  RN  stream  is  decreased  slightly,  and  the 
time  to  obtain  a  RN  is  reduced  by  20-30%.  CPRNG  generates  RNs  with  very  low  overhead. 

The  ISC  Test  was  used  to  determine  the  interstream  correlations  for  MLFG  and  CPRNG  in  Phase  I.  In  Phase  II,  several  other 
random  number  generators  implemented  within  the  CPRNG  Library  have  been  evaluated  for  interstream  correlations  using  the 
CPRNG-ISC  Test  framework.  The  results,  given  in  Figure  3  of  the  attached  document,  show  that  CPRNG  generates  a  large 
number  of  parallel  RN  streams  with  low  interstream  correlations. 

The  CIT  framework  is  used  to  compare  the  quality  metrics — DR  and  KS  statistics  [8], [9] — by  ISC  test  with  the  Dieharder  [1 1] 
and  TestUOl  [10]  test  batteries  that  are  commonly  used  in  literature.  In  general,  the  two  test  batteries  corroborate  each  other’s 
test  results  for  a  given  stream  of  RNs.  The  Ising  model  simulation  [5],  which  simulates  the  spread  of  energy  in  a  2-D  lattice,  and 
for  which  the  exact  theoretical  results  are  available,  is  the  application  we  used  to  corroborate  or  refute  the  results  by  various  test 
methods. 

For  the  pathological  linear  congruence  generator  (pLCG),  which  is  designed  to  have  high  interstream  correlations,  the  ISC  Test 
and  the  two  test  batteries  indicate  significant  correlations  among  RNs.  This  is  confirmed  by  the  Ising  model  simulations.  On  the 
other  hand,  drand48,  a  sequential  generator  commonly  available  on  Linux  and  Unix  systems,  is  reported  to  have  high 
correlations  by  Dieharder  and  TestUOl.  However,  ISC  test  does  not  indicate  any  correlations;  the  Ising  model  simulations 
confirm  the  ISC  test  results. 
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3.  Business  and  Dissemination  Plan 

The  major  components  supporting  long  term  sustainability  of  CPRNG  include: 

•  Preservation  of  all  the  software  files  and  documentation, 

•  Development  and  growth  of  a  body  of  users,  and 

•  Continuation  of  CPRNG  commercial  licensing  efforts,  with  a  primary  objective  of  licensing  to  a  strategic  partner  such  as  a 
processor  manufacturer  (for  inclusion  in  its  tools  library),  or  system  manufacturer  or  vendor,  or  a  major  software  provider. 

3.1.  Rationale 

In  order  to  accomplish  the  above,  Silicon  Informatics  plans  to  assign  its  interest  in  CPRNG  software,  the  “GetCPRNG.com” 
domain  and  associated  website  creative  files  to  the  University  of  Texas  (UT)  System.  To  the  extent  practicable,  Silicon 
Informatics  will  continue  to  support  the  development  of  a  user  community  and  commercial  licensing  efforts. 

The  innovations  developed  during  the  course  of  this  project  are  extensive  and  a  bit  ahead  of  their  time  than  was  anticipated  at 
the  outset  of  Phase  II  of  this  project.  Despite  our  extensive  outreach  to  the  government,  academic  and  private  sectors,  users’ 
needs  appear  to  be  met  with  conventional  tools,  including  those  that  have  been  introduced  since  the  inception  of  this  project. 
One  possible  explanation  is  that  much  of  the  high  performance  computing  software  used  for  production  runs  has  yet  to  be 
adapted  and  optimized  to  run  on  GPGPU-enhanced  machines.  [This  information  was  presented  by  IDC  in  conjunction  with  at 
the  SC14  Conference  in  New  Orleans,  LA,  Nov  2014,  a  copy  of  which  was  provided  to  the  COTR].  Despite  these  software 
issues,  semiconductor  manufacturers  continue  to  innovate  in  areas  of  parallelization,  processor-coprocessor  integration  and 
memory.  One  example  is  Intel  Corporation’s  Knights  Landing  processor,  a  “many  integrated  core”  architecture  that  competes 
with  GPGPU  products.  Knights  Landing,  with  first  shipments  expected  in  2015,  will  work  as  a  host  processor,  capable  of 
running  an  OS  and  applications  on  its  own,  while  at  the  same  time  functioning  as  a  coprocessor.  Another  example  is  Nvidia’s 
Titan  X  GPGPU,  announced  at  Nvidia’s  2015  GPU  Technology  Conference,  which  will  deliver  up  to  7  teraflops  of  single 
precision  performance.  We  cite  these  processor  trends,  as  they  exemplify  the  future  of  High  Performance  Computing  (HPC) 
where  the  circumstances  on  the  demand  side  will  eventually  ripen  for  CPRNG’s  innovations. 

In  the  meantime,  while  applications  software  is  adapted  and  optimized  (or  “rewritten”  according  to  IDC),  the  greatest  challenge 
is  to  gain  CPRNG  user  experience.  Toward  that  end,  the  University  of  Texas  at  San  Antonio  (UTSA)  is  well  positioned. 
Unhindered  by  jointly  held  intellectual  property  rights,  UTSA  will  have  the  ability  to  make  our  CPRNG  software  available  not  only 
to  Government  users,  but  also  to  users  throughout  the  UT  System.  Under  sole  ownership,  the  process  of  licensing  CPRNG 
commercially  will  be  streamlined. 

3.2.  Further  Work 

Based  on  our  evaluation  of  CPRNG  and  several  other  generators,  CPRNG  appears  to  be  a  high  quality  random  number 
generator  suitable  for  parallel  applications  that  require  a  large  number  independent  random  number  (RN)  streams.  The  ISC  test 
is  a  unique  test  to  evaluate  correlations  among  streams  without  being  limited  by  the  number  of  streams  or  number  of  random 
numbers.  It  also  has  the  capability  to  evaluate  the  random  numbers  used  by  an  application  on  the  fly  and  provide  a  quality 
metric  on  the  correlations  among  the  random  numbers  used.  The  CIT  framework  provides  a  flexible  framework  to  (a)  evaluate 
new  random  number  generators  easily  and  compare  them  to  the  existing  ones  and  (b)  compare  and  calibrate  the  new  random 
number  generator  test  packages  against  the  current  test  packages.  CIT  framework  is  very  powerful  for  the  design  and  testing  of 
new  random  number  generators  and  test  suites. 

The  research  and  software  produced  by  this  project  can  be  extended  in  making  CIT  framework  more  accessible  to  researchers 
that  use  a  wide  variety  of  computing  platforms  including  multicore,  GPU  and  many  integrated  core  (MIC)  architectures.  Another 
direction  for  further  work  is  to  implement  CPRNG  and  some  of  the  other  generators  for  MIC  architectures  such  as  Intel  Xeon 
Phi.  Currently,  the  ISC  test  can  analyze  RNs  based  a  pre-specified  grouping  of  streams  and  and  interleaving  method.  However, 
the  ISC  test  can  be  made  even  more  powerful  by  recoding  it  to  analyze  random  numbers  consumed  by  application  in  multiple 
ways  simultaneously. 

Appendixes: 

A.  Phase  I  Final  Technical  Report 

B.  Paper,  Context-Aware  Parallel  Pseudorandom  Number  Generators  for  Large  Parallel  Computations,  201 1  DoD  High 
Performance  Computing  Modernization  Program  (HCPMP)  Users  Group  Conference,  Rajendra  V.  Boppana,  June  201 1 

C.  US  Patent  8,868,630  B 1 ,  entitled  Verification  of  Pseudorandom  Number  Streams,  Inventors  Rajendra  V.  Boppana  and  Ram 

C.  Tripathi 

D.  US  Patent  Application  13/426,028,  entitled  Generation  of  Distinct  Pseudorandom  Number  Streams  based  on  Program 
Context,  inventor  Rajendra  V.  Boppana 


Technology  Transfer 

Silicon  Informatics  together  with  subcontractor  University  of  Texas  at  San  Antonio  have  engaged  in  outreach  to  several  DoD 
Defense  Supercomputing  Resource  Centers  and  service  laboratories  and  have  organized  demonstrations  to  researchers  at 
ARL  and  NRL.  Our  outreach  extended  to  NASA  and  DoE  laboratories  as  well  as  to  private  corporations.  Most  of  these 
organizations  have  participated  in  the  HPC  User  Forum  (www.hpcuserforum)  which  is  organized  by  IDC  and  convened  twice 
annually  at  various  locations  throughout  the  USA.  During  the  course  of  this  project,  we  have  attended  four  of  the  HPC  User 
Forum  meetings  and  were  invited  to  present  at  the  meeting  held  in  Boston  MA  in  2013.  We  have  also  attended  two  IEEE/ACM 
Supercomputing  conferences,  SCI  3  and  SC14,  where  we  met  with  representatives  from  Government  and  private  industry.  In 
addition,  we  participated  in  the  DoD  SBIR/STTR  Beyond  Phase  II  conference  in  San  Antonio  in  December  2014  where  we 
participated  in  several  one-on-one  meetings  with  DoD  lab,  agency  and  industry  representatives. 
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Technical  Progress  Report 
and  Final  Report 

STTR  Phase  II  Project:  Random  Number  Generation  for  High  Performance  Computing 
Period  of  Performance:  December  20,  2012  —  March  19,  2015 


1.  Accomplishments 

The  primary  objectives  of  the  Phase  II  of  the  project  are:  (a)  implement  the  context-aware 
parallel  random  number  generator  (CPRNG),  developed  in  Phase  I  of  this  project  [1],[2],[4], 
with  simple  application  programming  interface  (API)  and  scalability  to  accommodate 
applications  running  on  a  large  number  of  processor  cores  or  general  purpose  graphics 
processing  unit  (GPU)  cores;  (b)  implement  the  interstream  correlation  (ISC)  test  so  that  the 
quality  of  the  random  numbers  (RNs)  used  by  applications  are  evaluated  and  quality  metrics  are 
reported  on  demand  [3].  Both  objectives  have  been  accomplished.  The  following  additional 
design  and  implementation  contributions  have  been  accomplished  in  this  project. 

A  flexible  CPRNG-ISC  Test  (CIT)  framework  was  developed  and  implemented  so  that  a  third- 
party  tester  such  as  Dieharder  or  TestUOl  can  be  run  along  with  ISC  test  to  corroborate  or 
compare  ISC  test  results  with  those  from  the  well-known  single-stream  test  batteries. 

The  CPRNG  Library  is  implemented  in  a  flexible  manner  to  facilitate  implementation  and  use  of 
other  random  number  generators  within  the  test  framework  easily  (see  Figure  1). 


Application  ISC  Test  Other  Tester 

Results  Results  Results 


Figure  1.  CPRNG-ISC  Test  framework.  ISC  Test  can  be  run  concurrently  with  application  and  quality  of 
the  random  numbers  (RNs)  consumed  by  the  application  can  be  provided  periodically  or  upon  the 
completion  of  the  application.  In  addition,  a  third-party  tester  such  as  the  single-stream  offline  test 
packages  could  be  used  to  provide  an  alternate  method  to  assess  the  quality  of  the  RNs. 

CPRNG  Library  is  designed  to  accommodate  a  wide  variety  of  random  number  generators  with  a  simple 
interface  and  compare  their  suitability  for  a  given  application. 

Also,  new  random  number  generator  test  packages  can  be  evaluated  by  comparing  their  performance 
against  ISC  Test  or  other  known  test  packages. 
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To  demonstrate  the  flexibility  of  the  CIT  framework,  we  implemented  the  MLFG  generator  from 
SPRNG  package  [7],  drand48 — available  on  standard  Unix/Linux  systems,  and  a  parallel  RNG 
based  on  cryptographic  operations  from  the  family  of  generators  proposed  by  D.E.  Shaw  Group 
[12],  and  a  pathological  linear  congruence  generator  (pLCG).  In  addition,  we  implemented 
within  CPRNG  Library  to  provide  access  to  Intel’s  new  digital  random  number  generator 
(DRNG)  and  Nvidia’s  GPU-based  generator  MTGP32  [6],  when  the  host  system  has  the 
necessary  hardware — newer  processor  chips  or  GPUs,  respectively — to  support  these  generators. 

Three  versions  of  CPRNG  were  implemented:  CPU-based  context-free  generator,  which  is  used 
to  report  results  in  this  report,  CPU-based  context-aware  generator,  and  GPU-based  context-free 
generator.  A  context-aware  generator  automatically,  without  any  changes  to  the  application 
code,  uses  distinct  RN  streams  when  the  application  requests  for  RNs  from  a  stream  from 
different  program  contexts. 


2.  Performance  Analysis 


CPRNG,  the  new  parallel  random  generator  developed  in  Phase  I  of  this  project,  was 
implemented  in  the  SPRNG  package  in  Phase  I.  In  Phase  II,  it  was  implemented  as  a  standalone 
library  package  with  a  simple  application  programming  interface.  The  results  given  in  Figure  2 
indicate  that  the  time  to  initialize  a  RN  stream  is  decreased  slightly,  and  the  time  to  obtain  a  RN 
is  reduced  by  20-30%.  CPRNG  generates  RNs  with  very  low  overhead. 


RN  Stream  Initialization  Time  (ps) 

■  MLFG  ■CPRNG1  HCPRNG2 


Core  i7-870  Xeon  E5630 


RN  Generation  Time  (ns) 

■  MLFG  BCPRNG1  ■  CPRNG2 


Core  i7-870  Xeon  E5630 


Figure  2.  Comparison  of  RN  stream  initialization  and  RN  generation  times  for  two  implementations  of 
CPRNG.  CPRNG  2  is  the  Phase  II  implementation  in  the  CPRNG  Library.  CPRNG1  is  the  Phase  I 
implementation  of  CPRNG  in  SPRNG  package.  For  comparison  purposes,  the  times  for  MLFG,  a  parallel 
RNG  in  SPRNG  package  are  shown. 


The  ISC  Test  was  used  to  determine  the  interstream  correlations  for  MLFG  and  CPRNG  in 
Phase  I.  In  Phase  II,  several  other  random  number  generators  implemented  within  the  CPRNG 
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Library  have  been  evaluated  for  interstream  correlations  using  the  CPRNG-ISC  Test  framework. 
The  results,  given  in  Figure  3,  show  that  CPRNG  generates  a  large  number  of  parallel  RN 
streams  with  low  interstream  correlations. 

The  CIT  framework  is  used  to  compare  the  quality  metrics — DR  and  KS  statistics  [8], [9] — by 
ISC  test  with  the  Dieharder  [11]  and  TestUOl  [10]  test  batteries  that  are  commonly  used  in 
literature.  In  general,  the  two  test  batteries  corroborate  each  other’s  test  results  for  a  given  stream 
of  RNs.  The  Ising  model  simulation  [5],  which  simulates  the  spread  of  energy  in  a  2-D  lattice, 
and  for  which  the  exact  theoretical  results  are  available,  is  the  application  we  used  to  corroborate 
or  refute  the  results  by  various  test  methods. 

For  the  pathological  linear  congruence  generator  (pLCG),  which  is  designed  to  have  high 
interstream  correlations,  the  ISC  Test  and  the  two  test  batteries  indicate  significant  correlations 
among  RNs.  This  is  confirmed  by  the  Ising  model  simulations.  On  the  other  hand,  drand48,  a 
sequential  generator  commonly  available  on  Linux  and  Unix  systems,  is  reported  to  have  high 
correlations  by  Dieharder  and  TestUOl.  However,  ISC  test  does  not  indicate  any  correlations;  the 
Ising  model  simulations  confirm  the  ISC  test  results. 
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ISC  Test:  DR-Statistic 
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Figure  3.  ISC  Test  of  interstream  correlations  for  various  random  number  generators  implemented  or 
accessed  through  CPRNG  Library.  The  generators  implemented  within  the  CPRNG  Library  include 
mlfg — a  generator  from  SPRNG  package — and  crypt — a  cryptographic  operations-based  generator  from 
a  family  of  generators  designed  by  D.E.  Shaw  Group.  The  generators  accessed  through  CPRNG  Library 
include  mtgp — a  GPU-based  RNG  by  Nvidia — and  i-rng — a  new  RNG  by  Intel  which  has  hardware- 
support  for  seed  generation  with  high  entropy  and  cryptographic  operations  for  RN  generation. 

Two  statistical  tests,  Donner-Rossner  and  Kolmogorov-Smirnov  tests,  are  used  to  accept  or  reject  the 
hypothesis  that  the  parallel  streams  extracted  from  an  RNG  are  not  correlated.  The  significance  levels  are 
0.05  for  the  DR  test  and  0.01  for  the  KS  test.  The  dashed  lines  indicate  the  critical  value  below  which  the 
test  statistics  (DR  or  KS)  should  remain  to  validate  the  hypothesis. 
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3.  Business  and  Dissemination  Plan 

The  major  components  supporting  long  term  sustainability  of  CPRNG  include: 

•  Preservation  of  all  the  software  files  and  documentation, 

•  Development  and  growth  of  a  body  of  users,  and 

•  Continuation  of  CPRNG  commercial  licensing  efforts,  with  a  primary  objective  of 
licensing  to  a  strategic  partner  such  as  a  processor  manufacturer  (for  inclusion  in  its  tools 
library),  or  system  manufacturer  or  vendor,  or  a  major  software  provider. 

3.1.  Rationale 

In  order  to  accomplish  the  above,  Silicon  Informatics  plans  to  assign  its  interest  in  CPRNG 
software,  the  “GetCPRNG.com”  domain  and  associated  website  creative  files  to  the  University 
of  Texas  (UT)  System.  To  the  extent  practicable,  Silicon  Informatics  will  continue  to  support 
the  development  of  a  user  community  and  commercial  licensing  efforts. 

The  innovations  developed  during  the  course  of  this  project  are  extensive  and  a  bit  ahead  of  their 
time  than  was  anticipated  at  the  outset  of  Phase  II  of  this  project.  Despite  our  extensive  outreach 
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to  the  government,  academic  and  private  sectors,  users’  needs  appear  to  be  met  with 
conventional  tools,  including  those  that  have  been  introduced  since  the  inception  of  this  project. 
One  possible  explanation  is  that  much  of  the  high  perfonnance  computing  software  used  for 
production  runs  has  yet  to  be  adapted  and  optimized  to  run  on  GPGPU-enhanced  machines.1 
Despite  these  software  issues,  semiconductor  manufacturers  continue  to  innovate  in  areas  of 
parallelization,  processor-coprocessor  integration  and  memory.  One  example  is  Intel 
Corporation’s  Knights  Landing  processor,  a  “many  integrated  core”  architecture  that  competes 
with  GPGPU  products.  Knights  Landing,  with  first  shipments  expected  in  2015,  will  work  as  a 
host  processor,  capable  of  running  an  OS  and  applications  on  its  own,  while  at  the  same  time 
functioning  as  a  coprocessor.  Another  example  is  Nvidia’s  Titan  X  GPGPU,  announced  at 
Nvidia’s  2015  GPU  Technology  Conference,  which  will  deliver  up  to  7  teraflops  of  single 
precision  performance.  We  cite  these  processor  trends,  as  they  exemplify  the  future  of  High 
Performance  Computing  (HPC)  where  the  circumstances  on  the  demand  side  will  eventually 
ripen  for  CPRNG’s  innovations. 

In  the  meantime,  while  applications  software  is  adapted  and  optimized  (or  “rewritten”  according 
to  IDC),  the  greatest  challenge  is  to  gain  CPRNG  user  experience.  Toward  that  end,  the 
University  of  Texas  at  San  Antonio  (UTSA)  is  well  positioned.  Unhindered  by  jointly  held 
intellectual  property  rights,  UTSA  will  have  the  ability  to  make  our  CPRNG  software  available 
not  only  to  Government  users,  but  also  to  users  throughout  the  UT  System.  Under  sole 
ownership,  the  process  of  licensing  CPRNG  commercially  will  be  streamlined. 


3.2.  Further  Work 

Based  on  our  evaluation  of  CPRNG  and  several  other  generators,  CPRNG  appears  to  be  a  high 
quality  random  number  generator  suitable  for  parallel  applications  that  require  a  large  number 
independent  random  number  (RN)  streams.  The  ISC  test  is  a  unique  test  to  evaluate  correlations 
among  streams  without  being  limited  by  the  number  of  streams  or  number  of  random  numbers.  It 
also  has  the  capability  to  evaluate  the  random  numbers  used  by  an  application  on  the  fly  and 
provide  a  quality  metric  on  the  correlations  among  the  random  numbers  used.  The  CIT 
framework  provides  a  flexible  framework  to  (a)  evaluate  new  random  number  generators  easily 
and  compare  them  to  the  existing  ones  and  (b)  compare  and  calibrate  the  new  random  number 
generator  test  packages  against  the  current  test  packages.  CIT  framework  is  very  powerful  for  the 
design  and  testing  of  new  random  number  generators  and  test  suites. 

The  research  and  software  produced  by  this  project  can  be  extended  in  making  CIT  framework 
more  accessible  to  researchers  that  use  a  wide  variety  of  computing  platforms  including 
multicore,  GPU  and  many  integrated  core  (MIC)  architectures.  Another  direction  for  further 


1  Presentation  by  IDC,  “IDC  at  SC14”  slide  93  of  95,  Nov  18,  2014:  “software  is  the  #1  roadblock;  better 
management  software  is  needed,  parallel  software  is  lacking  for  most  users,  (and)  many  applications  will  need  a 
major  redesign.” 
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work  is  implement  CPRNG  and  some  of  the  other  generators  for  MIC  architectures  such  as  Intel 
Xeon  Phi.  Currently,  the  ISC  test  can  analyze  RNs  based  a  pre-specified  grouping  of  streams  and 
and  interleaving  method.  However,  the  ISC  test  can  be  made  even  more  powerful  by  recoding  it 
to  analyze  random  numbers  consumed  by  application  in  multiple  ways  simultaneously. 

Programmatic  issues:  None. 

4.  Schedule  Update 

STTR  Phase  II  Project  Gantt  chart  as  of  19  Mar  15 

July  20.  2015  at  3:14  PM 
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5.  Milestone  Update 

Milestones  completed  to  date:  The  fourth  and  final  of  the  software  is  released  in  July  2015  to 
ARL  researchers.  This  version  supersedes  the  prior  releases.  Initial  version  of  the  project  website 
is  active  and  hosted  by  Rackspace  at  the  URL  getcprng.com. 


This  report  is  supplemented  by  a  4-part  Appendix  consisting  of  the  technical  documents 
produced  as  part  of  the  project  and  the  final  report  from  Phase  I  of  this  project. 

Prepared  by:  Rajendra  V.  Boppana,  Ph.D.,  P.I.,  University  of  Texas  at  San  Antonio,  and 
Robert  Keller,  Project  Director,  Silicon  Infonnatics. 

July  22,  2015 


Appendix  A:  Phase  I  Final  Report 

Final  Technical  Report 
CLIN:  000 1AF 
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Silicon  Informatics,  Inc. 
Academic  Partner:  University  of  Texas  at  San  Antonio 
Contract#:  W91  INF-1  l-C-0026 


Final  Report 

STTR  Project:  Random  Number  Generation  for  High  Performance  Computing 


1.  Summary  of  Work  Completed 

This  project  has  two  primary  objectives:  (a)  design  and  implement  prototypes  of  highly  scalable, 
high  quality  parallel  random  number  generators  (PRNGs)  for  a  variety  of  computing  and 
programming  models  including  multicore/multithreaded,  message  passing,  and  general  purpose 
graphics  processing  unit  (GPGPU)  models;  (b)  design  and  implement  test  methods  that  evaluate 
the  independence  of  a  large  number  of  parallel  random  number  (RN)  streams  and  provide  easy  to 
use  quality  metrics.  Both  objectives  have  been  accomplished. 

The  rest  of  the  final  report  is  organized  as  follows.  First,  the  main  contributions  are  summarized. 
Descriptions  of  the  work  done  for  various  tasks  that  were  pursued  to  accomplish  the  project 
objectives  are  given  next.  Technical  details  and  performance  data  are  provided  in  two 
attachments:  a  supplementary  report  and  a  technical  paper  that  will  be  presented  at  DoD  HPC 
Users  Group  Conference,  June  2011. 

Main  contributions 

•  A  new  statistical  test,  called  ISC  test,  to  evaluate  interstream  correlations  of  a  large 
number  of  RN  streams  is  designed  and  implemented.  The  ISC  test  is  a  significant 
contribution  to  the  state-of-the  art  in  PRNG  testing.  It  can  be  used  to  evaluate  billions  of 
RN  streams  simultaneously  and  obtain  an  overall  quality  metric.  This  test  has  low 
computational  overhead  and  can  be  adapted  for  online  testing — in  which  the  RNs 
consumed  in  an  application  are  analyzed  in  parallel  with  the  application  and  a  quality 
metric  is  provided  at  the  end  of  the  application  execution.  To  the  best  of  our  knowledge, 
this  is  the  first  such  test.  The  ISC  test  identified  potential  correlations  among  the  streams 
of  a  popular  and  widely  used  PRNG  in  the  SPRNG  package.  The  test  results  were  further 
confirmed  with  a  new  DTMC  simulation  application  we  developed  in  this  project. 

•  ISC  test  is  a  first-level  test  method  with  applications  of  Ising  model  simulations  and  other 
applications  forming  the  next  level  test  methods.  A  new  application  based  on  the 
simulations  of  a  discrete -time  Markov  chain  (DTMC)  model  is  implemented.  This 
application  can  be  used  to  test  both  intrastream  correlations  and  interstream  correlations 
for  a  large  number  of  RN  streams. 

•  Online  version  of  ISC  test  and  additional  physical  modeling  applications  such  as  fracture 
analysis,  multiscale  modeling,  and  CTH  will  be  added  to  the  test  package  that  will  be 
implemented  in  Phase  II. 
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•  A  new  context-aware  parallel  random  number  generator  (CPRNG)  is  designed  and 
implemented.  CPRNG  is  highly  scalable  and  supports  applications  that  require  a  large 
and  unpredictable  number  (at  the  beginning  of  the  execution)  of  distinct  RN  streams.  The 
current  version  is  based  on  the  multiplicative  lagged  Fibonacci  generator  (MLFG) 
technique.  Additional  CPRNGs  based  on  other  RN  generation  techniques  will  be 
designed  and  implemented  in  Phase  II. 

•  CPRNG  implementation  supports  various  computing/programming  models:  sequential, 
multicore/multithreaded,  message  passing  (MPI),  and  GPGPU.  We  tested  the  functional 
correctness  of  all  these  implementations  extensively.  The  prototype  CPRNG 
implementation  is  free  of  memory  leaks  and  race  conditions;  it  can  supply  billions  of  RN 
streams  easily. 

•  CPRNG  prototype  implementation  is  tuned  extensively  for  efficient  initialization  of  RN 
Streams  and  generation  of  RNs.  With  respect  to  timing  costs,  CPRNG  compares  well 
with  the  basic  MLFG,  which  does  not  provide  the  same  level  of  flexible  and  scalable 
generation  of  streams  dynamically. 

•  Several  code  optimizations  that  reduce  the  overheads  and  improve  the  speed  of  CPRNG 
have  been  identified.  With  these  optimizations  incorporated  (in  Phase  II  implementation), 
CPRNG  will  perfonn  faster  with  less  overhead. 

Description  of  work  completed 

To  accomplish  the  project  objectives,  several  tasks  were  identified  and  pursued  during  the 
project  period.  The  work  completed  for  each  proposed  task  and  the  contributions  are  described 
below. 

A.  Comparison  and  assessment  of  current  parallel  random  number  generators  (PRNGs)  and  their 
evaluation  techniques. 

As  part  of  this  task,  we  identified  several  PRNG  software  packages  and  sequential  test  packages. 
The  SPRNG  package  from  Florida  State  University,  the  Dieharder  test  package  from  Duke 
University,  and  the  TestUOl  package  from  Universite  de  Montreal  obtained  for  this  task  are 
extensively  used  in  the  remainder  of  the  project  work. 

Regarding  the  currently  available  PRNGs,  we  identified  the  multiplicative  lagged  Fibonacci 
generator  (MLFG),  a  parameterized  approach  to  generate  independent  parallel  RN  streams,  as 
the  most  suitable  candidate  for  the  design  of  highly  scalable  and  high  quality  context-aware 
parallel  random  number  generators  (CPRNGs).  We  used  version  2  of  SPRNG  package,  which 
include  6  PRNGs,  as  the  platfonn  on  which  we  implemented  CPRNGs.  The  Ising  model 
simulations  (both  Metropolis  and  Wolff  algorithms)  implemented  in  SPRNG  have  been 
extensively  used  to  test  and  compare  CPRNGs  with  the  MLFG  and  other  generators  in  SPRNG. 
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B.  Implementation  of  PRNGs  on  multicore  and  GPGPUs 

SPRNG  provides  MPI  (message  passing  interface) — based  interface  for  parallel  applications 
designed  with  MPI  interface  for  interprocess  communication.  In  addition,  SPRNG  is 
implemented  in  such  a  way  that  multithread  programs  can  also  use  the  package  transparently. 
However,  the  burden  is  placed  on  the  application  developer/user  to  ensure  that  total  number  of 
streams  used  is  known  and  the  streams  are  allocated  to  different  threads/processes  suitably. 

We  developed  a  test  code  to  evaluate  the  time  taken  to  initialize  a  new  RN  stream  and  the  time  to 
generate  a  random  number  from  a  stream  using  Intel’s  timestamp  counters.  The  timing  tests  are 
repeated  several  times  and  averaged  to  obtain  representative  timing  data. 

Mersenne  twister  (MT),  in  particular,  Nvidia  developed  MTGP  generator,  is  extensively  used  by 
parallel  applications  that  use  GPUs.  However,  MTGP  is  a  single-stream  generator;  it  needs  to  be 
segmented  and  segments  must  be  allocated  to  different  threads.  Our  investigation  did  not  find  a 
truly  scalable  PRNG  with  small  state-space  and  highly  independent  RN  streams  needed  for 
large-scale  GPGPU  computing. 

C.  Evaluation  of  PRNGs  using  known  statistical  and  application-based  tests 

Single-stream  test  methods  have  been  extensively  studied  in  literature.  Many  single-stream  tests 
were  implemented  in  various  test  packages  including  the  Dieharder  and  TestUOl  packages, 
which  we  used  extensively.  Parallel  random  number  streams  are  interleaved  using  the  perfect 
shuffle  pattern  to  create  a  single  RN  stream  and  single-stream  tests  are  used  for  statistical 
evaluation  of  a  PRNG.  A  single-stream  test  package  contains  20  different  types  of  basic  tests 
(which  may  be  repeated  with  different  parameters  to  create  up  to  150  test  instances)  and  gives 
pass/fail  status  for  each  test  applied  to  the  interleaved  stream.  This  provides  a  vector  of  pass/fail 
information  that  will  be  hard  to  use  for  comparisons  of  different  PRNGs. 

We  tested  the  six  generators  in  SPRNG  using  Dieharder  and  TestUOl.  All  perform  well  with 
only  an  occasional  failure  for  one  of  the  tests.  These  tests  use  a  few  billions  of  RNs  from  the  test 
stream  for  these  tests.  Therefore,  they  are  not  suitable  to  test  a  large  number  of  parallel  RN 
streams;  if  a  billion  streams  are  interleaved  to  fonn  a  single  stream,  then  these  tests  only  examine 
a  few  numbers  from  each  stream,  which  may  not  be  enough  to  assess  the  inter-stream 
correlations.  On  the  other  hand,  if  a  billion  RN  streams  are  partitioned  into  several  sets  with  each 
set  consisting  of  a  small  number  of  RN  streams,  and  single-stream  tests  are  applied  on  each  set, 
then  these  tests  will  take  several  100s  of  hours  on  a  desktop  machine  and  provide  multiple 
vectors  of  pass/fail  information  that  will  be  hard  to  combine  into  an  easy  to  understand  quality 
metric. 

Regarding  application-based  testing,  the  Ising  model  simulation  codes  in  SPRNG  are  the  best 
known  and  most  commonly  used  applications  PRNG  evaluations. 
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D.  Development  of  new  statistical  tests  to  quantify  inter-stream  correlation 

We  implemented  an  interstream  correlation  (ISC)  test  to  evaluate  the  correlations  among  a  large 
number  of  RN  streams.  This  test  requires  parallel  RN  streams  to  be  combined  (using  perfect 
shuffle  interleaving  or  a  biased-interleaving)  into  a  bivariate  RN  stream  (RNs  1,  3,  ...  form  X 
variates  and  RNs  2,  4,  ..  form  Y  variates).  This  bivariate  RN  stream  is  transformed  into  bivariate 
nonnal  RN  stream  and  the  correlation  coefficient,  r ,  between  the  X  and  Y  variates  is  computed. 
Several  sets  of  RN  streams  are  used  compute  several  r ’s.  Collectively,  these  r ’s  are  the  samples 
that  can  be  used  to  estimate  p ,  the  true  common  correlation  coefficient  among  the  parallel  RN 
streams  generated  by  the  PRNG  being  evaluated.  We  used  Donner  and  Rosner  test  method  (DR- 
test,  Applied  Statistics  J.,  vol.  29,  no.  1,  1980)  to  combine  the  r’s  and  obtain  the  test  statistic 
denoted  /// ,  which  is  a  standard  normal  random  variate.  This  can  be  used  to  test  the  null 
hypothesis  Hq  :  p  =  0 .  Large  absolute  values  of  tH  will  lead  to  the  rejection  of  the  null 
hypothesis  and  the  acceptance  of  the  alternative  hypothesis  H\  :  p  0 .  For  a  significance  level 
a  =  0.05  ,  absolute  values  of  tfj  above  1.96  leads  to  the  rejection  of  the  claim  that  parallel  RN 
streams  are  independent;  the  probability  that  the  rejection  is  erroneous  is  a  =  0.05  .  One  could 
use  different  significance  levels:  for  a  =  0.02 ,  the  absolute  values  of  tH  above  2.33  will  lead  to 
rejection  of  the  claim  of  independence  of  RN  streams  with  only  0.02  probability  of  being  wrong. 

We  developed  a  Kolmogorov-Smirnov  test  (KS-test)  on  the  distribution  of  r ’s.  In  this  test,  the 
KS-test  statistic,  Dmax  ,  computed  using  the  r ’s  must  be  less  than  the  critical  value,  Dan ,  for 

significance  level  a  and  n  ,  the  number  of  r 's  used. 

A  preliminary  version  of  this  test  was  described  in  Monthly  Report  3  (January  2011). 

We  used  these  ISC  test  with  the  two  test  metrics  extensively  to  evaluate  the  correlations  among 
the  RN  streams  of  a  PRNG.  This  is  a  highly  scalable  test.  We  tested  up  to  1.5  billion  RN 
streams  with  at  least  100  numbers  taken  from  each  stream.  To  best  of  our  knowledge,  this  is  the 
first  time  a  billion  RN  streams  are  tested  simultaneously  and  a  single  figure  of  merit  is  given. 

In  our  test  process,  we  identified  significant  correlations  among  RN  streams  of  MLFG,  a  PRNG 
in  the  SPRNG  package.  Both  DR-test  and  KS-test  statistics,  tH  and  Dmax ,  give  very  high 

values  leading  the  rejection  of  the  claims  of  independence  of  the  RN  streams  generated  by  this 
PRNG.  MLFG  fails  the  ISC  test  consistently  when  15  million  or  more  streams  are  considered. 

We  confirmed  this  potential  problem  with  MLFG  using  a  new  application  we  developed  in  this 
project.  This  application  simulates  a  discrete-time  Markov  chain  (DTMC)  with  an  absorbing 
state.  (The  DTMC  estimates  the  number  of  packet  transmissions,  which  are  the  steps  or  state 
transitions  in  the  model,  it  takes  for  a  node  to  suspect  its  next  hop  node  of  dropping  its  packets  in 
a  multi-hop  wireless  network.)  Compared  to  the  Ising  model  simulations,  DTMC  model  can  use 
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a  large  number  of  RN  streams  much  more  speedily,  and  the  theoretical  values  can  be  calculated 
easily. 

When  1.5  million  or  more  streams  are  used,  MLFG  fails  to  match  the  theoretical  estimation. 
Since  the  application  is  a  simulation  of  the  model,  not  the  actual  wireless  network,  using  a  good 
PRNG  should  lead  to  quick  convergence  of  simulation  estimates  to  match  the  theoretical 
estimates. 

E.  Development  of  new  scalable  PRNGs 

A  highly  scalable  context-aware  parallel  random  number  generator  (CPRNG)  that  can  provide 
distinct  RN  streams  automatically  for  different  contexts  is  designed.  The  first  version  is  based  on 
MLFG  but  with  different  initialization  methods.  This  allows  a  large  number  of  distinct  RN 
streams  that  can  be  dynamically  requested  with  very  little  communication  cost:  beyond  the  initial 
specified  limit  of  RN  streams,  which  do  not  incur  any  interprocess  communication  or  thread 
synchronization/serialization,  an  application  can  request  for  new  RN  streams  with  only  two 
interprocess  communication  messages  or  a  mutex  lock  access.  Extensive  description  of  the 
design  of  CPRNG  is  given  in  Monthly  Reports  4  and  5. 

F.  Preliminary  implementation  and  evaluation  of  CPRNG 

We  implemented  CPRNG  in  the  SPRNG  package.  It  can  be  used  by  sequential  applications, 
multicore/multithreaded  applications,  MPI-based  parallel  applications,  GPGPU  based 
applications. 

The  functional  correctness  of  the  implementation  for  all  these  scenarios  is  tested  extensively 
using  a  parallel  application  (denoted  allreduce)  that  uses  multiple  RN  streams  and  multiple 
numbers  from  each  stream,  computes  their  overall  sum  modulo  100.  We  used  all  reduce  to  test 
as  many  as  1  million  RN  streams  and  ensured  that  CPRNG  provides  consistent  RNs  regardless  of 
the  number  processes/threads  used. 

We  evaluated  the  timing  costs  of  initialization  and  RN  generation.  The  initialization  cost  of 
CPRNG  is  about  26,000  clock  ticks,  which  is  about  the  same  as  that  of  MLFG  in  SPRNG 
package.  The  RN  generation  cost  is  about  3  clock  ticks  more  (23  vs.  20  ticks  on  a  machine  with 
Intel  quad-core  i7-870  CPU  and  20  vs.  17  ticks  on  a  machine  with  Intel  Xeon  E630  CPU). 

We  evaluated  the  quality  of  CPRNG  using  the  ISC  test,  Ising  model  simulations,  and  DTMC 
model  simulations.  The  results  for  the  Ising  model  simulations,  given  in  the  Monthly  Report  5, 
show  that  CPRNG  performs  about  the  same  as  that  MLFG  implemented  in  the  SPRNG  package. 
However,  these  simulations  use  at  most  256  RN  streams. 

The  ISC  test  is  used  for  further  evaluation.  With  up  to  1.5  billion  streams  used,  CPRNG 
performed  well  with  test  statics  below  the  corresponding  critical  values  in  all  but  one  instance. 
Even  in  that  scenario,  which  used  1.5  billion  RN  streams,  the  KS-statistic  was  slightly  higher 
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than  the  corresponding  critical  value,  but  the  DR-static  was  well  below  its  corresponding  critical 
value.  Further  testing  with  the  DTMC  application  showed  that  using  CPRNG  allows  the 
simulation  results  to  converge  to  the  theoretical  values  much  more  quickly  than  using  MLFG. 

Technical  details  and  performance  data  are  given  in  a  technical  paper  submitted  as  a  supplement 
to  this  report. 

Business  and  Dissemination  Plan 

One  of  ARO’s  objectives  in  supporting  this  research  is  to  ensure  that  the  PRNG  software  that 
results  from  this  research  is  relevant  to  and  used  in  military  and  commercial  simulation 
applications.  Our  proposal  for  Phase  II  of  this  project  sets  forth  a  detailed  plan  to  introduce  the 
new  context  aware  parallel  random  number  generator  (CPRNG)  to  the  high  perfonnance 
computing  (HPC)  community  and  make  it  available  to  military,  academic  and  commercial  users. 

Major  elements  of  this  plan  include: 

1 .  Communication  to  HPC  User  community.  The  first  such  communication  will  be  a  paper 
presented  by  Rajendra  Boppana  at  the  HPCMP  Users  Group  Conference  on  June  23, 
2011.  The  paper  is  entitled:  “Context-Aware  Parallel  Pseudorandom  Number 
Generators  for  Large  Parallel  Computations.  ”  Other  presentation  opportunities  include 
SC11  (Seattle,  November  2011,  http://scll.supercomputing.org/),  SC  12  (November 
2012)  and  IDC’s  HPC  User  Forum  April  2012.  Please  note  that  it  might  be  best  to 
introduce  our  commercial  version  of  the  CPRNG  software  through  a  paper/presentation 
at  the  SC  12  conference. 

UTSA  and  Silicon  Informatics  will  interact  with  and  provide  the  prototype  software  to 
select  HPC  users  and  parallel  application  developers  to  test  the  usability  and  quality  of 
the  random  numbers  generated  by  CPRNG  and  to  evaluate  the  effectiveness  of  the  online 
ISC  test  method.  These  evaluations  will  be  used  to  refine  the  prototype  prior  to  a  more 
general  release  to  the  HPC  community. 

2.  Creation  of  a  long-term  sustainability  plan,  the  product  of  research  undertaken  by  Silicon 
Informatics,  KEYW  Corporation  and  the  UTSA  Center  for  Innovation  and  Technology 
Entrepreneurship.  The  plan  will  identify  ways  to  reach  the  broadest  set  of  military, 
academic  and  commercial  users  while  generating  sufficient  revenue  to  ensure  that 
availability  of  the  CPRNG  software  is  sustainable  over  the  long  term. 

3.  Release  of  prototype  version  of  the  CPRNG  software,  complete  with  documentation,  for 
evaluation  and  implementation  at  US  Government  HPC  centers,  including  DoD  Major 
Shared  Resource  Centers. 

4.  Development  of  a  website  that  will  facilitate  distribution  and  support  of  the  software  for 
military,  academic  and  commercial  users. 
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5.  Release  of  a  fully-robust,  commercial  version  of  the  CPRNG  software. 

6.  Granting  royalty-based  sublicense  rights  that  enable  the  CPRNG  software  to  be  bundled 
and/or  integrated  with  other  applications  software. 

Programmatic  issues:  None. 
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2.  Schedule  Update 


Task 

Description 

Month  1 

Month  2 

Month  3 

Month  4 

Month  5 

Month  6 

Completion 

Status 

A 

Comparison  and  assessment  of 
current  PPRNGs  and  their 
evaluation  techniques 

100% 

B 

Implementation  of  PPRNGs  on 
muilti-core  CPUs  and  GPGPUs 

100% 

C 

Evaluation  of  PPRNGs  using 
known  statistical  and  application- 
based  tests 

100% 

D 

Development  of  new  statistical 
tests  to  quantify  inter-stream 
correlation 

1 

100% 

E 

Development  of  new  scalable 
PPRNG  algorithms 

100% 

F 

Preliminary  implementation  and 
evaluation  of  new  PPRNGs 

100% 

G 

Phase  I  final  report,  including 

Phase  11  work  plan 

■ 

100% 

3.  Milestone  Update 

Milestones  completed  to  date:  Tasks  A  through  G. 

Milestones  expected  to  be  completed  during  the  next  reporting  period:  None. 
Milestones  expected  to  be  missed  during  the  next  reporting  period:  None. 


Prepared  by: 

Rajendra  V.  Boppana,  Ph.D.,  P.I.,  University  of  Texas  at  San  Antonio  (technical  section)  and 
Robert  Keller,  Project  Director,  Silicon  Informatics  (Business  plan) 

May  23,  2011 
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Final  Report  Supplement 

STTR  Project:  Random  Number  Generation  for  High  Performance  Computing 


This  document  supplements  the  final  report  for  the  project  by  providing  test  data  and  brief 
explanations  of  the  same. 

1.  RNG  Timing  Tests 

CPRNG  is  the  new  random  number  generator  (RNG)  designed  in  this  project.  MLFG,  ALFG, 
and  ALFG17  are  the  RNGs  in  SPRNG  package.  CPRNG  is  implemented  in  the  SPRNG 
package.  Any  parallel  application  currently  designed  to  use  SPRNG  generators  can  use  CPRNG 
by  changing  the  RNG  code  to  9.  No  additional  application  modifications  are  needed. 

Two  computers,  a  desktop  computer  with  Intel  Core  i7-870  CPU  and  a  rack  server  with  Xeon 
E630  CPU,  are  used  to  estimate  the  time  required  for  initialization  of  a  random  number  (RN) 
stream  and  the  time  taken  to  generate  a  single  random  number  from  an  already  initialized  stream. 
The  times  are  given  in  clock  ticks — 2.93  ticks/ns  for  Core  i7-870  and  2.53  ticks/ns  for  Xeon 
E630  machines.  The  initialization  costs  of  CPRNG  are  about  the  same  as  those  of  MLFG,  on 
which  CPRNG  is  based.  The  cost  of  generating  an  RN  is  about  3  ticks  higher  compared  to 
MLFG  owing  to  the  additional  processing  needed  for  context- aware  RN  generation.  This  can  be 
easily  eliminated  if  the  application  does  not  require  contex-aware  RN  generation. 


RN  Stream  Initialization  Time  (clock  ticks) 

■  CPRNG  ■  MLFG  ■  ALFG  ■  ALFG  17 


Core  i7  870 


Xeon  E630 


RN  Generation  Time  (clock  ticks) 

I  CPRNG  ■  MLFG  ■  ALFG  ■  ALFG  17 


Core  i7  870 


Xeon  E630 


The  CPRNG  implementation  is  free  of  leaks,  is  multithread  safe,  and  works  seamlessly  with 
MPI-based  applications.  The  GPU  version  of  CPRNG  is  implemented  as  a  different  generator 
(with  RNG  code  10)  with  some  restrictions  on  features:  no  context-awareness,  and  the  maximum 
number  of  streams  needed  by  the  application  must  be  specified  at  the  beginning  of  the  program 
execution.  The  CPU  version  of  CPRNG  provides  context-awareness,  the  ability  to  use  distinct 
streams  automatically  for  different  contexts,  and  nearly  unlimited  number  of  RN  distinct  streams. 
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2.  ISC  Tests  for  Inter-stream  Correlations 

The  ISC  test  is  applied  on  several  sets  of  RN  streams.  The  RN  streams  in  a  set  are  interleaved 
using  perfect  shuffle  or  biased  interleaving  method.  Consider  three  RN  streams  A,  B  and  C  with 
RNs,  respectively,  ay ,  a2 ,  <23 , . . . ,  by ,  b2 , 63 , . . . ,  and  cy ,  c2 ,  c3 , . . . .  In  perfect  shuffle  interleaving,  a 

new  stream  ay,by,cy,i ?2 , 62 < c2 < a3 * •  •  •  is  created.  In  biased  interleaving,  ay , by , a 2 , cy, <23 , 62 > «  4 > •  •  • 
is  created.  The  RNs  in  the  odd  numbered  positions  form  the  X  variates  and  the  RNs  in  the  even 
numbered  positions  form  the  Y  variates.  These  are  transformed  into  normal  bivariates  using  Box- 
Muller  transform.  Correlation  coefficient,  r ,  for  the  bivariate  pairs  is  computed.  This  is  repeated 
several  times  to  obtain  multiple  r  ’s.  In  our  tests,  we  used  1500  sets  of  random  number  streams 
with  the  set  size  varied  from  10  to  1,000,000  to  obtain  H  >  r2  >  •  •  •  >  0  500  • 

These  r'  s  are  aggregated  using  a  well-developed  test  method  such  as  Donner  and  Rosner  test 
(DR-test)  or  Kolmogorov-Smirnov  test  (KS-test)  and  a  test  statistic  is  obtained.  The  statistic  for 
DR-test  is  denoted  as  tfj  and  the  statistic  for  KS-test  as  Dmax  .  For  each  test,  there  is  a  critical 

value  that  is  computed  based  on  the  desired  significance  level  and  the  number  of  r ’s  used.  For 
DR-test  at  a  significance  level  of  0.05,  the  critical  value  is  1.96  provided  the  number  of  bivariate 
pairs  used  to  calculate  each  r  is  large.  For  KS-test,  at  a  significance  level  of  0.01,  the  critical 
value  is  0.0274  when  the  number  of  r ’s  used  is  1500.  If  test  statistic  is  significantly  above  the 
critical  value,  then  the  RN  streams  generated  by  the  PRNG  are  likely  to  have  significant 
interstream  correlations. 

The  two  graphs  below  give  the  results  of  the  two  tests  for  shuffle-interleaving  of  RN  streams. 
When  the  set  size  is  1,000,000,  a  million  streams  are  used  to  obtain  a  single  r ,  and  a  total  of  1.5 
billion  streams  are  used  to  obtain  the  1500  r  ’ s  used  to  calculate  the  test  statistic.  The  dashed  line 
indicates  the  critical  value  for  that  test.  Our  results  indicate  that  MLFG  (the  built-in  random 
number  generator  in  the  SPRNG  package)  fails  both  DR-  and  KS-tests  for  test  configurations 
that  use  1.5  million  or  more  streams.  On  the  other  hand,  CPRNG,  which  is  also  based  on  the 
same  theoretical  foundation  as  that  of  MLFG,  performs  well;  it  narrowly  fails  the  KS-test  only 
for  the  largest  test  configurations  we  used. 
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3.  Application-based  Tests 

Ising  model  simulations 

We  have  tested  the  prototype  CPRNG  with  the  Ising  model  simulations  for  a  16x16  lattice 
based  on  Metropolis  and  Wolff  algorithms.  A  distinct  RN  stream  is  used  for  each  lattice  point. 
The  results  for  absolute  error  in  specific  heat  vs.  the  standard  deviation  are  shown  for  the 
Metropolis  and  Wolff  algorithms  in  the  graphs  below.  We  used  the  same  parameters  indicated  in 
the  paper  by  Srinivasan  et  ah,  “Testing  parallel  random  number  generators,”  Parallel  Computing 
2003:  16x16  lattice,  1000-word  blocks,  1  million  blocks  with  the  first  100  blocks  discarded. 
(These  graphs  are  the  revised  versions  of  the  graph  presented  in  Monthly  Report  5.) 


In  these  simulations,  the  absolute  error  (the  difference  between  the  theoretical  calculations  and 
the  simulation  values)  of  specific  heat  or  energy  is  compared  to  the  standard  error  of  the  same 
metric  (1.96  times  the  standard  deviation  of  the  simulation  values)  at  a  significance  level  of  0.05. 
That  is  up  to  5%  of  the  error  points  may  be  greater  than  the  standard  error  and  lie  above  the  cut¬ 
off  line  indicated  in  the  graph.  The  MLFG  results  in  the  left  graph  are  exactly  the  same  as  the 
ones  presented  in  Fig.  6  of  the  paper  by  Srinivasan  et  al.  These  results  show  that  CPRNG  is  no 
worse  than  MLFG  for  this  test. 

Markov  model  simulations 

We  also  implemented  a  new  test  based  on  the  simulation  of  a  discrete-time  Markov  chain 
(DTMC)  that  models  the  time  it  takes  a  node  to  suspect  its  next  hop  of  dropping  packets  based 
on  transmission  overhearing  in  wireless  ad  hoc  networks.  The  DTMC  estimates  the  number  of 
packet  transmissions,  which  are  the  steps  or  state  transitions  in  the  model,  it  takes  for  a  node  to 
suspect  its  next  hop  node  of  dropping  its  packets  in  a  multi-hop  wireless  network.  The  DTMC 
has  an  absorbing  state  denoting  the  state  in  which  the  next  hop  is  suspected  and  L  transient 
states,  where  L  is  the  threshold  to  suspect  the  next  hop  node.  Since  the  application  is  a 
simulation  of  the  model,  not  the  simulation  of  the  actual  wireless  network,  using  a  good  PRNG 
should  lead  to  quick  convergence  of  simulation  estimates  to  match  the  theoretical  estimates. 
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Based  on  several  test  runs  using  the  DTMC  application,  we  observed  that  simulations  that  use 
MLFG  do  not  converge  as  rapidly  as  the  simulations  that  use  CPRNG. 

Compared  to  the  Ising  model  simulations,  DTMC  model  simulations  can  use  a  large  number  of 
RN  streams  much  more  speedily,  and  the  theoretical  values  can  be  calculated  more  easily.  In 
fact,  we  implemented  the  code  within  the  simulation  program  so  that  the  appropriate  theoretical 
values  can  be  calculated  based  on  the  test  parameters  prior  to  a  simulation.  With  appropriate 
choice  of  parameters,  DTMC  application  can  use  a  large  number  of  RN  streams  and/or  a  large 
number  of  RNs  from  each  stream.  If  a  simulation  is  repeated  k  times,  and  there  are  L  transient 
states,  it  is  natural  to  use  L  distinct  RN  streams  in  each  simulation  run  or  a  total  of  kL  RN 
streams  for  the  entire  simulation.  By  changing  the  threshold  L  ,  the  number  of  RNs  consumed  in 
a  simulation  run  can  be  increased. 

The  results  of  simulations  for  various  threshold  values  L  for  a  scenario  are  given  in  the  following 
graph.  For  each  threshold  value,  MLFG  or  CPRNG  is  used  to  simulate  the  Markov  model  to 
determine  the  number  of  steps  taken  to  reach  the  absorbing  state.  This  is  repeated  10,000  times 
and  the  average  number  of  steps  taken  to  reach  the  absorbing  state  is  estimated.  This  estimate  is 
compared  with  the  theoretical  calculations  by  calculating  the  absolute  deviation  as  a  percentage 
of  the  theoretical  value.  The  cut-off  point  is  1%.  If  the  deviation  is  above  1%,  then  the  simulation 
is  considered  to  have  not  converged.  For  the  four  tests  we  conducted,  simulations  using  MLFG 
converge  in  two  out  of  four  scenarios,  while  the  simulations  using  CPRNG  converged  in  all  four 
cases. 


DTMC  Simulation  Results  vs.  Theory 


This  application  is  promising,  but  further  investigation  is  needed  to  understand  its  usefulness  in 
testing  the  correlations  among  RN  streams. 
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Abstract 

Design  and  testing  of parallel  pseudorandom  number  generators  (PRNGs)  that  generate  millions  of  parallel  random 
number  (RN)  streams  needed  for  large  parallel  computations  is  a  nontrivial  task  if:  a)  the  number  of  parallel  streams 
are  not  fixed  at  the  beginning  of  the  program  execution,  and  they  are  to  be  generated  in  a  distributed  manner  with  low 
communication  overhead;  and  b)  the  correlations  among  the  parallel  streams  must  be  very  low.  Furthermore,  the  current 
PRNGs  require  the  user  to  manage  the  number  of  streams  and  their  initialization,  which  can  be  onerous  if  each  process 
or  thread  of  a  parallel  application  consumes  RNs  at  multiple  locations  and,  for  better  randomization,  distinct  RN  streams 
must  be  used  in  each  instance.  In  this  paper,  both  problems  are  addressed  using  context-aware  PRNGs.  In  this  approach, 
each  request  for  an  RN  stream  by  a  process/thread  results  in  the  allocation  of  a  large  set  of  RN  streams,  so  that  each  distinct 
program  statement  that  calls  for  RN  generation  (denoted,  RN  context)  is  served  with  a  distinct  RN  stream  taken  from  the  RN 
streams  assigned  to  that  process.  A  prototype  context-aware  parallel  random  number  generators  (CPRNGs)  based  on  the 
multiplicative  lagged  Fibonacci  generator  is  implemented  for  automatic  RN  stream  generation  based  on  RN  contexts.  A 
new  parallel  statistical  test,  called  the  inter-stream  correlation  (ISC)  test,  is  designed  and  implemented  to  assess  the  degree 
of  independence  among  a  large  number  of  parallel  RN  streams  and  provide  an  easy-to-use  quality  metric.  Preliminary 
results  indicate  that  the  prototype  CPRNG  provides  high-quality  RN  streams,  and  that  the  ISC  test  promises  to  be  a  highly- 
effective  test  to  assess  correlations  among  a  large  number  of  RN  streams. 

1.  Introduction 

Many  scientific  computing  applications,  business  and  finance  applications,  and  complex  systems  modeling  and 
analysis  techniques  use  random  number  generators1  (RNGs)  extensively  for  simulations  of  various  likely  scenarios  and 
estimations  of  potential  outcomes.  Often,  these  applications  are  highly-scalable  and  can  take  advantage  of  the  availability 
of  thousands  of  computing  cores  on  heterogeneous  systems  comprising  multi-core  processors  (CPUs)  and  highly-parallel 
general-purpose  graphics  processing  units  (GPGPUs),  provided  that  suitable  parallel  random  number  generators  (PRNGs) 
are  available  to  simultaneously  feed  thousands  of  computing  streams  with  high-quality  random  number  (RN)  streams  with 
low  intra-  and  inter-stream  correlations. 

We  present  context-aware  parallel  random  number  generators  (CPRNGs)  based  on  a  new  approach  to  allocate  and 
manage  RN  streams  by  parameterized  random  number  generators  that  can  generate  virtually  unlimited  numbers  of  distinct 
RN  Streams.  Lagged  Fibonacci  generators  (LFGs),  which  generate  a  new  RN  by  applying  an  arithmetic  or  logic  operation 
on  two  or  more  previously  generated  RNs,  can  provide  a  large  number  of  distinct  RN  streams,  with  each  stream  having  a 
large  cycle — the  number  of  RNs  that  can  be  used  before  the  sequence  repeats.  A  prototype  CPRNG,  based  on  multiplicative 
lagged  Fibonacci  generator  (MLFG),  is  implemented  and  evaluated.  CPRNG  provides  two  new  features  that  a  basic  MLFG 
does  not  provide. 

•  CPRNG  uses  the  program  context,  in  which  a  request  for  a  random  number  is  made,  to  automatically  select  and 
use  distinct  RN  Streams  for  distinct  contexts. 

•  A  typical  PRNG  requires  the  application  to  declare  the  maximum  number  of  RN  streams  used  in  an  execution  run, 
and  the  number  of  distinct  RN  streams  requested  to  be  within  this  limit.  However,  this  can  be  a  significant  constraint 


!The  random  number  generators  we  consider  this  paper  are  pseudorandom  number  generators.  For  easier  description,  however,  we  simply  refer  to  them 
as  random  number  generators. 
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for  applications  that  may  spawn  additional  processes  during  the  execution,  based  on  the  intermediate  results  and 
use  unpredictable  number  of  RN  streams. 

•  CPRNG  relaxes  this  constraint  and  allows  applications  to  request  virtually  unlimited  number  of  RN  streams  beyond 
any  initially-specified  RN  stream  limit. 

Another  problem  addressed  in  this  work  is  the  evaluation  of  intra-stream  and  inter-stream  correlations — i.e.,  the  quality 
of  the  random  numbers  generated.  Several  excellent  statistical  tests11111  are  available  to  test  intra-stream  correlations  of 
a  sequential  RNG.  Test  packages  such  as  Diehearder[16]  and  TestUOl1151  run  a  battery  of  such  tests  on  an  RN  stream  and 
provide  pass/fail  results  from  each  test.  If  an  RN  stream  fails  any  of  the  tests,  then  additional,  more  detailed  tests  are 
conducted.  Otherwise,  it  is  assumed  that  the  RN  stream  is  free  of  intra-stream  correlations  with  very  high  probability. 

To  assess  the  quality  of  a  PRNG,  several  parallel  RN  streams  generated  by  it  are  interleaved  using  the  perfect  shuffle 
pattern  to  create  a  single  RN  stream,  and  single-stream  test  batteries  are  used  to  evaluate  the  inter-stream  correlations 
among  the  RN  streams17  81.  A  single-stream  test  package  contains  20  different  types  of  basic  tests  (which  may  be  repeated 
with  different  parameters  to  create  up  to  150  test  instances)  and  gives  pass/fail  status  for  each  test  applied  to  the  interleaved 
stream.  This  provides  a  vector  of  pass/fail  information  that  will  be  hard  to  use  for  comparisons  of  different  PRNGs. 
Furthermore,  these  tests  use  a  few  billions  of  RNs  from  the  test  stream  for  these  tests.  Therefore,  they  are  not  suitable 
to  test  a  large  number  of  parallel  RN  streams;  if  a  billion  streams  are  interleaved  to  form  a  single  stream,  then  these  tests 
only  examine  a  few  numbers  from  each  stream,  which  may  not  be  enough  to  assess  the  inter-stream  correlations.  On  the 
other  hand,  if  a  billion  RN  streams  are  partitioned  into  several  subsets  with  each  subset  consisting  of  a  small  number  of  RN 
streams,  and  single-stream  test  batteries  are  applied  on  each  set,  then  these  tests  will  take  several  100’s  of  hours  on  a  desktop 
machine  and  provide  multiple  vectors  of  pass/fail  information  that  will  be  hard  to  combine  into  an  easy-to-understand 
quality  metric. 

Another  approach  is  to  use  thoroughly  analyzed  applications  to  test  inter-stream  and  long-range  correlations  of  RNGs. 
For  example,  a  physics  application  involving  simulations  of  two-dimensional  (2D)  Ising  square  lattice  models  with  periodic 
boundary  conditions,  for  which  the  exact  solutions  are  known,  is  often  used  to  evaluate  PRNGs114'71.  Flowever,  application- 
based  tests  are  computationally-expensive  and  may  not  be  adaptable  to  test  billions  of  parallel  streams  at  a  time.  Therefore, 
faster  and  more  informative  statistical  tests  of  parallel  RN  streams  are  needed.  Currently,  there  are  very  few  parallel 
statistical  tests  that  do  not  require  serialization  of  RN  streams  and  have  the  potential  to  evaluate  inter-stream  correlations. 

We  present  a  new  inter-stream  correlation  (ISC)  test  that  evaluates  a  large  number  of  parallel  RN  streams  simultaneously, 
and  provides  an  easy-to-use  quality  metric.  The  ISC  test  divides  the  total  streams  to  be  evaluated  into  subsets  of  streams, 
and  computes  a  correlation  coefficient  for  each  subset.  These  correlation  coefficients  are  aggregated  using  a  theoretically- 
sound  test  method  such  as  the  Donner  and  Rosner  test  (DR  test)1131  or  Kolmogorov-Smirnov  test  (KS  test)1141  and  a  test 
statistic  is  obtained.  If  the  test  statistic  is  too  high  compared  to  a  suitably  determined  critical  value,  the  claim  of  independent 
RN  streams  is  rejected.  Lack  of  rejection  indicates  that  the  RN  streams  are  likely  to  be  independent. 

We  present  preliminary  results  of  the  implementation  of  a  prototype  CPRNG  and  the  application  of  ISC  test.  Timing 
tests  show  that  CPRNG  is  nearly  as  fast  as  a  basic  PRNG,  such  as  MLFG,  and  incurs  only  a  small  amount  of  overhead  to 
provide  the  context-awareness.  The  ISC  test  found  significant  correlations  in  the  RN  streams  generated  by  multiplicative 
Fibonacci  lagged  generator  (MLFG),  in  the  widely-used  SPRNG  package. 

The  rest  of  the  paper  is  organized  as  follows.  Section  2  presents  the  basics  of  parallel  random  number  generators. 
Section  3  presents  the  context-aware  PRNGs,  and  compares  a  prototype  PRNG  with  the  widely-used  MLFG  in  the  SPRNG 
package.  Section  4  presents  the  ISC  test  and  its  application  to  CPRNG  and  MLFG,  with  up  to  1.5  billion  streams  analyzed. 
Section  5  describes  the  related  work  in  PRNGs  and  test  methods.  Section  6  concludes  the  paper. 

2.  Background 

We  are  interested  in  parameterized  RNGs  that  have  the  capability  to  generate  a  large  number  of  RN  streams  with 
relatively  simple  changes  to  the  initialization.  Lagged  Fibonacci  generators18’9101  are  easy  to  parameterize  and,  with  careful 
selection  of  the  parameters,  can  be  used  to  generate  virtually  unlimited  number  high-quality  distinct  RN  streams  easily.  In 
particular,  we  are  interested  in  the  multiplicative  Fibonacci  lagged  generator  (MLFG),  which  uses  the  recurrence  relation: 

xlt  =  x„-kx  x„-i  (mod  2m),  0  <k<l  <n,  (1) 

where  /  and  k  are  the  lags  (or  indices  to  the  older  random  numbers  used  to  generate  the  new  random  number),  and  x,  ’s  are 
positive  and  odd  m- bit  integers.  This  generator  produces  2<"1_3,*(M)  different  RN  streams,  each  with  a  cycle  length  of  2("'_3)- 
(2/ — 1).  Therefore,  there  are  (m~ 3)  •  (/ —  1 )  bits  that  need  be  determined  uniquely  for  each  RN  stream.  (One  of  the  initial  lag 
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words  and  the  least  significant  bits  of  all  initial  lag  words  are  specified  by  the  canonical  form  and  parameters  specified,  and 
are  common  to  all  RN  streams  with  those  parameters1101.) 

For  a  64-bit  MFLG  with  lag  17,  there  are  261*16=2976  different  RN  streams,  each  with  a  distinct  976-bit  seed  value  and  a 
cycle  length  of  261  ■  (217-1)  ~2n.  In  practice,  the  upper  or  the  middle  b,  b< 64,  bits  of  x,’s  are  extracted  and  supplied  as  the 
RNs  to  improve  the  randomness,  since  the  lower  bits  are  often  less  random  owing  to  the  arithmetic  operation  involved.  We 
used  the  SPRNG  package181  and  the  MLFG  available  from  its  library,  to  implement  a  prototype  CPRNG. 

Additive  lagged  Fibonacci  generators  (ALFGs)  are  obtained  by  replacing  the  multiply  operation  in  Equation  1  with  an 
add  operation;  x,’s  are  positive  m-bit  integers  with  at  least  one  odd  integer  in  the  first  /  lags.  Compared  to  MLFGs,  ALFGs 
provide  more  distinct  RN  streams  with  longer  cycles  for  the  same  bit-size.  Flowever,  the  intra-stream  and  inter-stream 
correlations  are  more  significant  in  ALFGs.  To  mitigate  these  issues,  larger  lags,  /=1,279  or  larger,  are  used.  SPRNG 
package  combines  two  ALFGs  with  different  lag  words  to  provide  a  higher-quality  ALFG.  On  the  other  hand,  MLFG  can 
be  used  with  smaller  lags,  e.g.,  /=17.  Therefore,  for  the  most  commonly  used  configuration  in  SPRNG  package,  ALFGs 
take  twice  as  much  time  to  initialize  a  new  RN  stream  and  to  generate  RNs  as  MLFG. 

SPRNG  library  package  provides  init_rng()  and  get_rn_dbl()  function  calls  to  initialize  a  new  RN  stream  and  to  obtain 
the  next  RN  in  an  already  initialized  stream,  respectively.  The  init  rng  function  is  called  by  specifying  the  seed,  parameter 
sets  that  specify  the  lags  and  the  locations  of  the  odd-numbered  words  in  the  initial  set  of  lag  words,  maximum  number  of 
RN  streams  (denoted  max  str)  that  will  be  requested  by  the  application,  and  curstr,  the  RN  stream  number  in  the  range  [0, 
max_str)  that  needs  to  be  initialized.  The  seed,  parameter  set,  and  max_str  must  be  common  in  all  init_rng()  calls.  For  most 
parallel  applications,  it  is  easy  to  allocate  the  RN  streams  to  processes  based  on  the  input  data  and/or  computations  allocated 
to  them.  For  example,  if  a  computational  loop  is  partitioned  cyclically  among  p  processes,  then  iteration  i  is  executed  by 
process  i%p  ;  in  this  case  it  is  natural  to  allocate  RN  streams  from  the  set  i,i+p,.. .  to  process  i. 

Each  call  to  init_rng()  results  in  the  initialization  of  the  RN  stream  specified  by  the  stream  number,  cur  str,  and  the 
calling  code  is  given  a  pointer  to  the  RN  stream  that  should  be  used  as  argument  in  the  function  call  get_rn_dbl()  to  obtain 
the  next  RN  in  the  stream.  (SPRNG  library  provides  several  other  function  calls  including  requests  for  integer  RNs  instead 
of  reals,  but  they  are  not  of  interest  in  this  paper.) 

3.  Context-aware  Parallel  Random  Number  Generators 

If  a  process  uses  RNs  in  multiple  locations  for  different  purposes,  then  it  is  generally  recommended  that  a  distinct  RN 
stream  be  used  for  each  such  context.  Flowever,  with  the  current  RNG  packages,  this  requires  the  application  to  explicitly 
initialize  the  additional  RN  streams  needed  and,  more  importantly,  use  the  appropriate  RN  stream  pointer  in  each  context. 

This  puts  a  significant  burden  on  the  application  developer  to  manage  the  RN  streams  and  contexts.  Any  changes  to  the 
code  that  change  the  number  of  contexts  require  additional  work  by  the  application  developer  to  make  suitable  changes  to 
the  RN  stream  management.  While  it  is  natural  and  intuitive  to  partition  RN  streams  based  on  the  partitioning  of  input  data 
or  computations,  explicitly  managing  multiple  RN  streams  based  on  program  contexts  makes  the  application  less  portable 
and  distracts  the  application  developer. 

To  address  these  concerns  and  to  improve  the  quality  of  RNs  used  by  applications,  we  developed  the  CPRNG.  The 
design  methodology  is  to  take  a  parameterized  random  number  generator  that  has  the  capability  to  generate  a  large  number 
of  RN  streams  with  relatively  simple  changes  to  the  initialization  and  augment  it  with  a  scalable  and  automatic  initialization 
process.  Our  first  CPRNG  is  based  on  the  MLFG  described  in  Section  2.  We  used  the  SPRNG  package  and  the  MLFG 
available  from  its  library,  to  implement  the  prototype  CPRNG.  The  design  of  CPRNG  is  elaborated  below. 

3.1  CPRNG  Design 

In  CPRNG  implementation,  each  init_rng()  call  allocates  not  just  one  RN  stream  but  a  set  of  distinct  RN  streams 
and  returns  a  pointer,  str_ptr,  to  the  set;  the  streams  in  this  set  can  be  customized  with  program  context  without  further 
calls  to  init_rng().  The  RN-context,  the  context  or  the  program  location  from  which  a  RN  number  is  requested,  is  used  in 
addition  to  the  stream-set  pointer,  str_ptr,  to  determine  the  specific  RN  stream  to  be  used.  The  RN  context  is  derived  from 
a  combination  of  the  program  line  number  in  the  source  code,  the  return  address  of  the  function  call  to  get_rn_dbl(),  the 
process/thread  numbers,  and  any  user  supplied  identifiers  such  as  the  iteration  number.  When  the  application  requests  for 
a  random  number  using  the  function  call  get_rn_dbl(str_ptr),  the  RN-context  will  be  used  to  determine  the  specific  RN 
stream  to  be  used  in  the  set  of  streams  pointed  by  str_ptr.  The  appropriate  RN  stream  is  automatically  initialized  with  the 
RN-context,  if  it  is  the  first  call  from  this  context,  and  a  RN  from  the  stream  is  returned. 
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Figure  1  describes  the  initialization  process  by  CPRNG  with  lag  parameters  /  and  k,  0  <  k  <  1  -  2.  A  call  to  init_rng() 
results  initialization  of  /  -  3  of  the  lag  words2 * * *  using  a  sequential  RNG  such  as  the  recursive  with  carry  (RWC)  generator 
described  in  the  Diehard  package[2]  seeded  with  the  user-specified  seed.  These  lag  words  are  common  to  the  initialization 
of  all  RN  streams,  regardless  of  the  process  number  or  RN-context.  One  of  the  remaining  three  lag  words  is  filled  with  an 
ID  that  is  guaranteed  to  be  distinct  for  distinct  curstr  numbers  specified  in  init_rng().  The  distinct  ID  word  is  common  to 
the  set  of  RN  streams  that  are  allocated  in  response  to  init_rng()  call.  The  remaining  two  lag  words  are  filled  with  the  RN- 
context  so  that  distinct  RN-contexts  result  in  distinct  RN  streams. 
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Figure  1 .  Initialization  of  RN  stream  lags  by  CPRNG.  Each  lag  word  is  a  64-bit  word  with  maximum  lag  L.  L-3  of  the  lag  words 
are  filled  randomly,  based  on  the  user-specified  seed  and  a  sequential  RNG.  These  words  are  common  to  all  RN  streams  used 
during  the  execution  of  the  application.  Lag  K,  K<L-2,  is  initialized  with  a  unique  and  distinct  ID  that  is  associated  with  the  cur_ 
str  used  in  the  init_rng()  call.  Lags  K+1  and  K+2  are  initialized  with  RN-context  to  create  a  distinct  RN  stream  fo  reach  distinct 

program  context  in  each  process. 


For  a  CPRNG  based  on  MLFG  with  maximum  lag  1=11  and  64-bit  words,  22*61=2122  distinct  RN  streams  are  allocated 
with  each  init_rng()  call.  Based  on  the  context  and  str_ptr  argument  used  in  a  call  to  get_rn_dbl(),  an  appropriate  stream  is 
selected,  automatically  initialized  prior  to  first  use,  and  the  next  RN  in  the  stream  is  returned.  CPRNG  may  be  used  without 
RN-contexts  by  choosing  appropriate  parameters  to  init_rng()  call.  If  RN-contexts  are  not  used,  then  the  two  lag  words 
that  are  normally  filled  with  RN-context  are  filled  with  the  random  bits  generated  by  the  sequential  RWC  generator;  the  lag 
word  with  distinct  ID  ensures  that  RN  streams  are  distinct  for  distinct  values  of  cur  str  specified  in  the  init_rng().  CPRNG 
will  be  simply  a  basic  MLFG  when  used  without  context. 

For  applications  that  use  a  large  and  variable  number  of  RN  streams,  having  to  specify  the  maximum  number  of  streams 
used  during  an  execution  run  is  a  limitation.  Furthermore,  certain  large-scale  parallel  applications  may  spawn  additional 
processes  and  threads  dynamically  depending  on  the  input  data  and  intermediate  results.  To  accommodate  such  situations, 
CPRNG  assigns  210  distinct  IDs  for  the  lag  word  k  upon  a  call  to  init_rng(),  independent  of  any  streams  allocated  to  handle 
RN  contexts.  Typically,  only  one  of  these  IDs  is  used  by  a  process.  However,  if  a  process  spawns  threads  or  child  processes 
and  needs  to  use  additional  distinct  RN  streams  without  going  through  the  initialization  process,  it  can  have  them  without 
any  communication  overhead  by  using  the  original  initialization  with  the  distinct  ID  word  replaced  with  one  of  the  unused 
IDs  from  its  allocated  IDs.  This  leads  to  faster  initialization  of  the  new  RN  streams  on  demand.  If  more  RN  streams  are 
needed  and  init_rng()  is  called  with  cur  str  value  greater  than  max  str,  a  monotonically  increasing  counter  is  used  to  ensure 
that  the  lag  word  K  is  distinct.  However,  the  access  to  this  counter  needs  to  be  serialized  by  using  appropriate  mutex  locks 
in  threaded  applications  or  by  assigning  it  to  a  process  to  serve  the  counter-values  to  the  other  processes  of  the  application. 
Only  in  these  instances,  an  additional  communication  or  serialization  overhead  is  incurred  by  CPRNG,  compared  to  the 
static  methods  used  in  the  current  packages  such  as  SPRNG.  On  the  other  hand,  CPRNG  provides  a  virtually  unlimited 


2The  initialization  of  the  lag  words  is  more  complicated  than  the  simpler  description  given  here.  For  MLFG,  all  the  lag  words  must  be  odd  values.  The 

two  consecutive  32-bit  RNs  generated  by  the  RWC  generator  are  used  form  a  61 -bit  integer  and  a  least  significant  bit  determined  by  the  canonical  form 

and  parameter  set  is  appended  to  it  to  form  a  62-bit  number,  say,  z.  The  actual  lag  word  is  formed  by  using  the  operation  (—  l)y  3 z  mod  264  ,  where y  is  a 

randomly  generated  1  or  0.  However,  for  easier  description,  we  omit  these  implementation  details.  See  Reference  10  for  the  complete  details. 
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number  of  RN  streams  on  demand,  and  avoids  depletion  of  the  available  RN  streams  that  can  occur  with  static  partitioning 
of  the  available  RN  streams  for  applications  with  many  levels  of  dynamic  process/thread  creation. 

CPRNG  is  implemented  in  the  SPRNG  package  as  an  additional  PRNG.  The  implementation  provides  the  same 
application  interface  as  the  other  PRNGS  in  the  package.  Any  parallel  application  currently  designed  to  use  SPRNG 
generators  can  use  CPRNG  by  using  an  appropriate  RNG  code.  No  additional  application  modifications  are  needed.  Just 
like  the  other  PRNGs  in  the  SPRNG  package,  CPRNG  produces  consistent  and  predictable  RN  streams  for  an  application 
regardless  of  the  number  of  processes/threads  used  for  parallel  computation.  The  CPRNG  implementation  is  free  of 
memory  leaks,  is  multithread  safe,  and  works  seamlessly  with  MPI-based  applications.  The  GPU  version  of  CPRNG  is 
implemented  as  a  different  generator  with  some  restrictions  on  features:  no  context-awareness,  and  the  maximum  number 
of  streams  needed  by  the  application  must  be  specified  at  the  beginning  of  the  program  execution. 

3.2  Timing  Tests 

Two  computers,  a  desktop  computer  with  Intel  Core  i7-870  CPU  and  a  rack  server  with  Xeon  E630  CPU,  are  used 
to  estimate  the  time  required  for  initialization  of  a  random  number  (RN)  stream  by  calling  init_rng(),and  the  time  taken 
to  generate  a  single  random  number  from  an  already  initialized  stream  for  CPRNG  with  lag  17  and  three  generators  from 
the  SPRNG  package:  MLFG — multiplicative  lagged  Fibonacci  generator  with  lag  17,  ALFG — lagged  Fibonacci  generator 
which  is  a  combination  of  two  additive  Fibonacci  generators  with  lag  1,279,  and  ALFG17 — lagged  Fibonacci  generator 
with  lag  17. 

We  used  Intel  CPU  time-stamp  counter  for  the  time-stamps.  For  RN  stream  initialization  test,  the  time  taken  to 
initialize  a  single  RN  stream  is  subtracted  from  the  time  taken  to  initialize  two  RN  streams.  This  is  repeated  100  times  and 
the  average  of  the  times  is  taken  as  a  sample  point.  This  experiment  is  repeated  10  times  and  the  average  of  the  10  samples 
and  the  corresponding  95%  confidence  interval  is  calculated.  For  RN  generation  test,  the  time  taken  to  generate  1,000 
RNs  from  an  already  initialized  stream  is  subtracted  from  the  time  taken  to  generate  1,000  RNs  each  from  two  previously 
initialized  RN  Streams.  This  time  is  divided  by  1,000  to  get  the  time  taken  to  generate  a  single  RN.  This  is  repeated  100 
times  and  the  average  is  taken  as  a  single  sample  point.  This  experiment  is  repeated  10  times  and  the  average  of  the  10 
sample  points  and  the  corresponding  95%  confidence  interval  is  calculated. 

Figure  2  gives  the  times  in  clock-ticks — 2.93  ticks/ns  for  the  Core  i7-870,  and  2.53  ticks/ns  for  the  Xeon  E630  machines. 
The  chart  on  the  left  gives  the  initialization  time  of  an  RN  stream,  while  the  chart  on  the  right  gives  the  time  taken  to  get  an 
RN  from  an  initialized  stream.  The  initialization  costs  of  CPRNG  are  about  the  same  as  those  of  MLFG,  on  which  CPRNG 
is  based.  The  cost  of  generating  an  RN  is  about  3  ticks  higher  compared  to  MLFG  owing  to  the  additional  processing 
needed  for  context-aware  RN  generation.  This  can  be  easily  eliminated  if  the  application  does  not  require  context-aware 
RN  generation.  ALFG  has  high  initialization  overhead  since  it  uses  two  additive  Fibonacci  generators  with  a  large  amount 
of  lag  (the  oldest  RN  used  in  calculating  the  next  RN)  to  provide  high-quality  RN  streams.  To  rule  out  any  experimental 
error,  we  tested  ALFG  with  lag  17  (which  is  not  recommended),  whose  initialization  cost  is  comparable  to  those  of  MLFG 
and  CPRNG. 
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Figure  2.  Time  taken  to  initialize  RN  streams  (left  chart)  and  to  generate  RNs  (right  chart).  A  desktop  computer  with  Intel  Core 
i7-870  CPU  and  a  rack  server  with  Xeon  E630  are  used.  The  times  are  given  in  clock-ticks — 2.93  ticks/ns  for  the  Core  i7-870,  and 
2.53  ticks/ns  for  the  Xeon  E630.  The  y-axis  for  the  left  chart  is  in  log-scale.  The  95%  confidence  intervals  are  ±  1%  of  the  mean- 

values  reported. 
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4.  Inter-Stream  Correlation  Test 


The  inter-stream  correlation  (ISC)  test  evaluates  the  correlations  among  a  large  number  of  RN  streams.  The  RN  streams 
are  divided  into  several  subsets,  and  the  streams  in  a  subset  are  interleaved,  using  perfect  shuffle  or  biased  interleaving 
method.  Consider  three  RN  streams  A,  B  and  C  with  RNs  au  a2,  bu  b2,  b3,...,  cu  c2,  c3,...,  respectively.  In  perfect 
shuffle  interleaving,  a  new  stream  ax,  bu  cu  a2,  b2,  c2,  a3,...  is  created.  In  biased  interleaving,  au  bu  a2,  ch  a3,  b2,  a4,...  is 
created.  The  RNs  in  the  odd-numbered  positions  form  the  X  variates  and  the  RNs  in  the  even-numbered  positions  form  the 
Y  variates.  These  are  transformed  into  normal  bi-variates  using  Box-Muller  transform1121.  Correlation  coefficient,  r,  for  the 
bi-variate  pairs  is  computed.  This  is  repeated  several  times  to  obtain  multiple  r’s.  Collectively,  these  r’s  are  the  samples 
that  can  be  used  to  estimate  p,  the  true  common  correlation  coefficient  among  the  parallel  RN  streams  generated  by  the 
PRNG  being  evaluated. 

The  r’s  are  aggregated  using  a  theoretically-sound  test  method  such  as  Donner  and  Rosner  test  (DR-test)[BI  or 
Kolmogorov-Smirnov  test  (KS-test)1141  and  a  test  statistic  is  obtained.  The  statistic  for  DR-test  is  denoted  as  tH  and  the 
statistic  for  KS-test  as  Dmax.  For  each  test,  there  is  a  critical  value  that  is  computed  based  on  the  desired  significance  level 
and  the  number  of  r’s  used.  For  example,  for  DR-test  at  a  significance  level  of  0.05,  the  critical  value  is  1.96  provided  the 
number  of  bi-variate  pairs  used  to  calculate  each  r  is  large.  If  the  test  statistic  is  significantly  above  the  critical  value,  then 
the  RN  streams  generated  by  the  PRNG  are  likely  to  have  significant  inter-stream  correlations. 

The  DR-test  combines  the  r’s  and  gives  the  test  statistic  tH,  which  is  a  standard  normal  variate.  This  can  be  used  to 
test  the  null  hypothesis  H0  :  p  =  0  .  Large  absolute  values  of  will  lead  to  the  rejection  of  the  null  hypothesis  and  the 
acceptance  of  the  alternative  hypothesis  H, :  p^O.  For  the  significance  level  a=0.05,  absolute  values  of  tH  above  1.96  lead  to 
the  rejection  of  the  claim  that  parallel  RN  streams  are  independent;  the  probability  that  the  rejection  is  erroneous  is  a=0.05. 
One  could  use  different  significance  levels:  for  a=0.02,  the  absolute  values  of  tH  above  2.33  will  lead  to  rejection  of  the 
claim  of  independence  ofRN  streams  with  only  0.02  probability  of  being  wrong. 

The  distribution  of  r’s  is  approximately  normal.  These  r’s  can  be  converted  into  standard  normal  variates  using  sample 
variance  of  r’s  and  the  fact  that  we  are  testing  for  p=0.  This  enables  us  to  apply  the  KS-test  on  the  distribution  of  r’s.  In 
this  test,  the  KS-test  statistic,  Dmax,  computed  using  the  r’s  must  be  less  than  the  critical  value,  Dan,  for  significance  level  a 
and  n  ,  the  number  of  r‘s  used.  For  KS-test,  at  a  significance  level  of  0.01,  the  critical  value  is  0.0274  when  the  number  of 
r’s  used  is  1,500. 

Figure  3  gives  the  results  of  the  two  tests  for  shuffle-interleaving  of  RN  streams  generated  by  CPRNG  and  MLFG.  In 
our  tests,  we  used  1,500  sets  of  random  number  streams  with  the  set  size  varied  from  10  to  1,000,000  to  obtain  ru  r2,..., 
r i^oo*  When  the  set  size  is  1,000,000,  a  million  streams  are  used  to  obtain  a  single  r,  and  a  total  of  1.5  billion  streams  are 
used  to  obtain  the  1,500  r’s  used  to  calculate  the  test  statistic.  The  dashed-lines  indicate  the  critical  values  for  the  test 
statistics.  Our  results  indicate  that  MLFG  (the  built-in  random  number  generator  in  the  SPRNG  package)  fails  both  DR- 
and  KS-tests  for  test  configurations  that  use  1.5  million  or  more  streams.  On  the  other  hand,  CPRNG,  which  is  also  based 
on  the  same  theoretical  foundation  as  that  of  MLFG,  performs  well;  it  narrowly  fails  the  KS-test  only  for  the  largest  test 
configurations  we  used. 
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Figure  3.  ISC  tests  for  CPRNG  and  MLFG.  The  dashed-lines  indicate  the  critical  values.  The  y-axes  for  both  charts  are  in 
log-scale.  For  both  tests,  MLFG's  test  statistic  is  significantly  higher  than  the  critical  value,  indicating  that  the  RN  streams 
generated  by  MLFG  may  have  significant  inter-stream  correlations  and  must  be  tested  further. 
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5.  Related  Work 


The  designs  of  sequential  and  parallel  RNGs  are  extensively  investigated  by  many  researchers  owing  to  their 
importance  to  computational  science  and  to  the  elegant,  mathematical  nature  of  the  problem.  KnuthllJ  discusses  several 
RNGs,  and  many  excellent  single-stream  test  methods  that  are  implemented  in  popular  test  batteries  such  as  Dieharder1161 
and  TestU01[15].  Linear  congruential  generators  that  use  a  recursive  equation  of  the  form  x=a  •  x„-,+b  (mod  2'”),  where  a 
and  b  are  constants,  are  commonly  available  as  part  of  the  C  math  library  in  a  typical  UNIX  environment.  One  of  219937 
the  most  popular  sequential  RNGs  is  the  Mersenne  twister151  which  offers  an  RN  stream  with  a  cycle  of  length.  A  graphics 
processing  unit  (GPU)  version  of  this  RNG[6]  is  commonly  used  by  applications  designed  to  use  GPUs. 

Additive  and  multiplicative  lagged  Fibonacci  generators18-10'  have  been  extensively  investigated  because  of  the  ease 
with  which  they  can  be  used  to  generate  distinct  RN  streams.  Of  the  two,  MLFG  is  considered  to  be  more  robust,  producing 
higher-quality  RN  streams.  Both  generators  are  implemented  in  the  popular  SPRNG  package181.  We  have  used  the  SPRNG 
package  extensively.  The  prototype  implementation  of  CPRNG  is  based  on  the  MLFG  implementation  and  supports 
SPRNG’s  application  interface. 

SPRNG  also  implements  several  sequential  tests  and  provides  a  systematic  way  to  interleave  several  streams  into 
a  single-stream  and  apply  the  sequential  tests.  Flowever,  owing  to  the  availability  of  more  exhaustive  test  packages, 
Dieharder  and  TestUOl,  we  did  not  use  the  single-stream  tests  in  SPRNG.  Another  important  resource  provided  by  SPRNG 
is  the  Ising  model  simulation  codes  based  on  Metropolis  and  Wolff  algorithms.  These  applications  are  widely-used  to 
evaluate  sequential  and  parallel  RNGs13,4,71. 

6.  Conclusion 

Context-aware  parallel  random  number  generators  are  based  on  a  new  approach  to  allocate  and  manage  RN  streams  by 
parameterized  random  number  generators  that  can  generate  virtually  unlimited  numbers  of  distinct  RN  Streams.  Compared 
to  the  parallel  random  number  generators  in  the  current  packages  such  as  SPRNG,  CPRNGs  can  automatically  provide 
distinct  RN  streams  depending  on  the  program  context  to  improve  the  quality  of  the  RNs  used.  To  achieve  the  same  effect 
with  the  current  PRNGs,  the  application  needs  to,  explicitly,  manage  multiple  streams  and  their  usage  based  on  the  program 
context.  Furthermore,  CPRNGs  support  highly-complex  parallel  applications  that  require  a  large  and  variable  number  of 
RN  streams  by  dynamically  allocating  RN  streams  beyond  the  maximum  number  of  RN  streams  specified  at  the  beginning 
of  program  execution.  In  contrast,  the  current  PRNGs  do  not  allow  applications  to  request  RN  streams  beyond  the  initially 
specified  number  of  RN  streams.  Some  implementations,  e.g.,  SPRNG,  handle  this  issue  by  partitioning  the  total  RN 
streams  using  a  binary  partitioning  scheme.  For  applications  that  have  many  levels  of  dynamic  process/thread  creation,  this 
can  result  in  depletion  of  RN  streams  available  to  dynamically-created  processes/threads. 

The  inter-stream  correlation  test  evaluates  the  correlations  among  a  large  number  of  RN  streams.  Using  a  well-known 
test  method  such  as  the  Donner  and  Rosner  test  or  the  Kolmogorov-Smirnov  test,  it  provides  an  aggregate  PRNG  quality 
metric.  This  test  complements  the  existing  sing-stream  test  batteries  and  application-based  tests  currently  available.  It  is 
applied  to  evaluate  inter-stream  correlations  among  as  many  as  1.5  billion  RN  streams.  The  ISC  test  shows  that  the  MLFG 
used  in  SPRNG  has  significant  inter-stream  correlations  when  1.5  million  or  more  streams  are  considered.  In  addition  to 
providing  an  easy-to-use  quality  metric,  the  ISC  test  is  fast  and  can  be  adapted  to  on-line  testing,  in  which  the  actual  RNs 
used  by  an  application  are  fed  to  ISC  test,  and  overall  quality  of  the  RNs  used  is  provided  at  the  end  of  the  execution  of  the 
application. 

In  the  future,  we  plan  to  revise  the  current  implementation  and  release  a  CPRNG  library  package  to  the  HPC  community. 
We  also  plan  to  design  and  implement  additional  CPRNGs  based  on  other  RNGs.  We  will  work  with  HPC  practitioners  in 
adapting  new  applications  that  use  multi-scale  simulation  models  to  augment  the  current  test  methods. 
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VERIFICATION  OF  PSEUDORANDOM 
NUMBER  STREAMS 

PRIORITY  CLAIM 

5 

This  application  claims  priority  to  U.S.  Provisional  Appli¬ 
cation  No.  61/454,259  entitled  “Verification  of  Pseudoran¬ 
dom  Number  Streams”  to  Boppana  et  al.  filed  Mar.  1 8, 201 1 , 
which  is  incorporated  herein  by  reference  in  its  entirety. 

to 

BACKGROUND 

1.  Field 

This  disclosure  relates  to  the  field  of  computation.  More 
particularly,  this  disclosure  relates  to  methods  for  assessing  15 
pseudorandom  number  streams. 

2.  Description  of  the  Related  Art 

Random  number  generators,  which  generate  streams  of 
seemingly  random  numbers,  are  used  in  many  computing 
applications.  An  application  may  use  a  single  stream  of  ran-  20 
dom  numbers  or  multiple  streams  of  random  numbers  simul¬ 
taneously.  A  sequential  random  number  generator  is  designed 
to  generate  a  single  stream  of  random  numbers,  the  starting 
point  of  which  may  be  changed  with  the  initial  (seed)  value.  A 
parallel  random  number  generator  (PRNG)  is  designed  to  25 
generate  multiple,  independent  streams  of  random  numbers 
simultaneously  with  a  simple  change  in  a  parameter  used  to 
initialize  the  random  number  streams. 

It  is  often  useful  to  test  a  random  number  generator  to 
assess  the  quality  of  the  random  number  stream.  Some  single-  30 
stream  statistical  test  batteries  provide  pass/fail  indication  for 
each  test  in  the  battery,  since  it  may  not  be  meaningful  to 
combine  the  statistical  computations  from  multiple  tests  to 
provide  an  overall  quality  metric  for  the  RNG  (random  num¬ 
ber  generator)  tested.  Therefore,  it  is  common  to  use  the  test  35 
results  as  a  multi-bit  vector  data,  with  each  bit  representing 
the  pass/fail  status  for  a  test.  The  statistical  test  batteries  do 
not  provide  a  single  quantitative  metric  to  compare  the  two 
generators.  This  could  be  a  limitation  if  two  RNGs  that  need 
to  be  compared  fail  different  tests.  40 

Single-stream  tests  may  be  ineffective  for  testing  the  cor¬ 
relations  of  random  numbers  among  a  large  number  (e.g., 
thousands  to  billions)  of  parallel  random  number  streams 
since  the  a  typical  single-stream  test  method  may  operate  on 
blocks  of  a  few  thousands  of  numbers  at  a  time.  Typical  45 
existing  test  methods  may  be  considered  off-line  methods  in 
the  sense  that  the  tests  are  fed  with  data  generated  by  the 
random  number  generator  that  is  being  evaluated  specifically 
for  test  purposes. 

Parallel  random  number  streams  may  be  generated  by  a  50 
parameterized  family  of  pseudorandom  number  generators, 
by  a  collection  of  true  random  number  generators  that  gener¬ 
ate  random  numbers  based  on  environmental  signals  such  as 
noise  levels  and  temperature,  computing  and  communication 
delays,  events  induced  by  computer  users  or  other  sources,  or  55 
any  combination  of  the  pseudo-  and  true  random  number 
generators.  The  quality  of  the  random  numbers  used  may  be 
crucial  for  quick  and  accurate  results  from  computer-based 
simulations  and  for  robust  security  protocols  and  security 
keys  used  in  security  protocols.  60 

Some  methods  to  test  and  assess  the  independence  of  par¬ 
allel  random  number  streams  are  typically  based  on  sequen¬ 
tial  test  methods  that  are  designed  to  test  intra-stream  corre¬ 
lations  of  a  single  random  number  stream.  One  practice  for 
statistical  testing  of  PRNG  quality  is  to  generate  parallel  65 
streams,  interleave  them  to  form  a  single  stream,  and  apply 
single- stream  tests  to  the  interleaved  stream.  If  the  interleaved 
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stream  passes  most  or  all  of  the  single-stream  tests,  then  the 
PRNG  may  be  deemed  to  be  of  good  quality  and  is  accepted 
for  use  in  applications. 

SUMMARY 

In  an  embodiment,  a  method  of  assessing  parallel  random 
number  streams  includes  mixing  two  or  more  parallel  random 
number  streams.  Mixing  the  parallel  random  number  streams 
may  include  pairing  one  of  the  random  number  streams  with 
one  or  more  of  the  other  random  number  streams.  For  each 
pairing  of  the  parallel  random  number  streams,  an  inter¬ 
stream  correlation  value  may  be  computed  based  on  a  corre¬ 
lation  between  the  two  random  number  streams  in  the  pair.  A 
quality  metric  for  the  parallel  random  number  streams  is 
determined  from  inter-stream  correlation  values  for  the  pairs 
of  the  parallel  random  number  streams. 

In  an  embodiment,  a  method  of  assessing  quality  of  a 
random  number  stream  includes  segmenting  the  random 
number  stream  into  two  or  more  random  number  substreams . 
The  random  number  substreams  may  be  mixed.  Mixing  the 
random  number  substreams  may  include  pairing  one  of  the 
substreams  with  one  or  more  of  the  other  substreams.  For 
each  pair  of  the  random  number  substreams,  a  correlation 
value  may  be  computed  based  on  a  correlation  between  the 
random  number  substreams  in  the  pair.  A  quality  metric  for 
the  random  number  stream  is  determined  from  correlation 
values  for  the  pairs  of  the  random  number  substreams. 

In  various  embodiments,  methods,  systems  and  apparatus 
are  used  to  test  a  large  number  of  parallel  random  number 
streams  and  to  quantify  interstream  correlations  among  them 
so  that  their  randomness  can  be  assessed.  Correlations  may  be 
tested  among  a  large  number  (hundreds  to  billions)  of  streams 
and  the  computed  correlation  coefficients  may  be  combined 
so  that  the  user  of  a  parallel  random  number  generator  can 
assess  a  priori  or  dynamically  (during  the  consumption  of  the 
random  numbers)  the  quality  of  random  numbers  used  for 
his/her  application.  In  some  embodiments,  an  online  test  is 
performed  of  the  quality  of  RN  streams  as  the  random  num¬ 
bers  are  generated  by  the  PRNG  for  an  actual  application  use. 

In  some  embodiments,  an  interstream  correlation  (ISC) 
test  evaluates  a  large  number  of  parallel  RN  streams  simulta¬ 
neously  and  provides  a  quality  metric.  The  ISC  test  may 
divide  the  total  streams  to  be  evaluated  into  subsets  of 
streams,  with  at  least  two  streams  in  each  subset,  and  compute 
a  correlation  coefficient  for  each  subset.  These  correlation 
coefficients  may  be  combined  using  a  theoretically  sound  test 
method  such  as  the  Domier  and  Rosner  test  (DR  test)  or 
Kolmogorov-Smirnov  test  (KS  test),  and  a  test  statistic  may 
be  obtained.  If  the  test  statistic  is  higher  than  a  suitably 
determined  critical  value,  the  claim  of  independent  RN 
streams  is  rejected.  A  lack  of  rejection  indicates  that  the  RN 
streams  are  likely  to  be  independent. 

BRIEF  DESCRIPTION  OF  THE  DRAWINGS 

FIG.  1  is  an  exemplary  block  diagram  illustrating  a  parallel 
pseudorandom  number  generator  test  metric  computation 
according  to  one  embodiment. 

FIG.  2  is  an  exemplary  flow  chart  of  the  logic  implemented 
by  an  inter-stream  correlation  test  according  to  one  embodi¬ 
ment. 

FIG.  3  is  a  flow  diagram  illustrating  one  embodiment  of 
assessing  parallel  random  number  streams. 

FIG.  4  is  a  flow  diagram  illustrating  one  embodiment  of 
assessing  a  random  number  stream. 
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While  the  invention  is  described  herein  by  way  of  example 
for  several  embodiments  and  illustrative  drawings,  those 
skilled  in  the  art  will  recognize  that  the  invention  is  not 
limited  to  the  embodiments  or  drawings  described.  It  should 
be  understood,  that  the  drawings  and  detailed  description  5 
thereto  are  not  intended  to  limit  the  invention  to  the  particular 
form  disclosed,  but  on  the  contrary,  the  intention  is  to  cover  all 
modifications,  equivalents  and  alternatives  falling  within  the 
spirit  and  scope  of  the  present  invention  as  defined  by  the 
appended  claims.  The  headings  used  herein  are  for  organiza-  to 
tional  purposes  only  and  are  not  meant  to  be  used  to  limit  the 
scope  of  the  description  or  the  claims.  As  used  throughout  this 
application,  the  word  “may”  is  used  in  a  permissive  sense 
(i.e.,  meaning  having  the  potential  to),  rather  than  the  man¬ 
datory  sense  (i.e.,  meaning  must).  Similarly,  the  words  15 
“include”,  “including”,  and  “includes”  mean  including,  but 
not  limited  to. 

DETAILED  DESCRIPTION  OF  EMBODIMENTS 

20 

The  following  abbreviations  and  acronyms  are  used  herein. 
RN:  Random  number; 

RNG:  pseudorandom  number  generator; 

PRNG:  parallel  pseudorandom  number  generator; 

ISC:  interstream  correlation;  25 

CPU:  central  processing  unit  or  processor; 

GPU:  graphic  processing  unit  or  graphics  processor  used  for 
general  purpose  array  computing; 

MC:  Monte  Carlo  simulations. 

As  used  herein,  “pairing”,  in  the  context  of  number  30 
streams,  includes  mixing  or  combining  one  stream  with  one 
or  more  other  streams,  or  considering  or  assessing  one  stream 
in  relation  to  one  or  more  other  streams  (for  example  com¬ 
puting  a  correlation  between  two  streams).  As  examples,  a 
pairing  may  include:  (a)  pairing  a  selected  stream  with  35 
another  stream,  (b)  pairing  a  selected  stream  with  an  inter¬ 
leaved  stream  of  two  or  more  other  streams,  and  (c)  interleav¬ 
ing  a  selected  stream  and  one  or  more  other  streams. 

As  used  herein,  “random  number”  includes,  but  is  not 
limited  to,  a  true  random  number,  a  pseudorandom  number,  or  40 
a  number  generated  from  a  combination  of  true  random  and 
pseudorandom  number  methods.  As  used  herein,  a  “random 
number  generator”  includes,  but  is  not  limited  to,  a  pseudo¬ 
random  number  generator. 

FIG.  1  is  an  exemplary  block  diagram  illustrating  the  45 
PRNG  test  metric  computation.  In  FIG.  1  PRNG  101  is  the 
parallel  random  number  generator  that  needs  to  be  tested  for 
the  independence  of  its  streams  102.  Each  line  may  provide  a 
single  stream  of  RNs  spaced  in  time.  These  RNs  may  be  fed 
to  the  application  103  as  part  of  the  application’s  input  data.  50 
The  application  103  may  be  executed  normally  and  the  output 
of  the  application  may  be  obtained. 

In  some  embodiments,  a  parallel  random  number  generator 
may  be  part  of  the  application.  In  such  cases,  PRNG  101  and 
Application  103  may  be  described  by  a  single  block  feeding  55 
ISC  Tester  105. 

ISC  Tester  105  may  be  fed  with  RN  streams  102  and  a  test 
specification.  The  test  specification  may  specify  the  interleav¬ 
ing  method  for  mixing  the  streams  and  the  statistical  method 
that  is  used  for  computation  of  a  quality  metric.  60 

FIG.  2  is  an  exemplary  flow  chart  of  logic  implemented  by 
an  inter-stream  correlation  test  according  to  one  embodiment. 
ISC  Tester  105  may  be  fed  with  parallel  RN  streams  and  test 
specification  criteria.  The  initialization  and  storage  unit  201 
may  ensure  that  these  RNs  are  available  for  repeated  use  65 
during  the  test  method.  Based  on  the  specified  interleaving, 
stream  mixer  program  202  may  select  a  stream  and  mix  it  with 
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the  remaining  streams  (if  the  specification  is  biased  interleav¬ 
ing)  or  with  a  subset  of  the  other  streams  (if  the  specification 
is  group,  shuffled  or  pairwise  interleaving)  to  create  a  single 
stream  with  RNs  from  the  selected  stream  occupying  the  odd 
numbered  positions  and  the  RNs  from  the  other  streams  occu¬ 
pying  the  even  numbered  positions.  Stream  mixer  program 
may  skip  the  user-specified  number  of  initial  RNs  from  one  or 
more  of  the  streams  prior  to  mixing  them.  The  RNs  in  the  odd 
numbered  positions  (positions  1,  3,  5, ... )  from  the  resulting 
mixed  stream  may  be  considered  as  x/s  and  the  RNs  in  the 
even  numbered  positions  as  y,’s.  Therefore,  the  resulting 
mixed  stream  may  be  considered  as  a  sequential  stream  of  (x,  ; 
y,)  bivariate  pairs.  This  mixed  stream  may  be  fed  to  correla¬ 
tion  coefficient  computing  program  203.  Correlation  coeffi¬ 
cient  computing  program  203  may  calculate  inter-stream  cor¬ 
relations  of  the  two  streams  provided  to  it  by  the  stream  mixer 
202.  The  computed  correlation  coefficient  is  stored.  A  tester 
204  checks  if  all  the  desired  combinations  of  interstream 
correlations  are  computed.  If  there  are  one  or  more  combina¬ 
tions  remain,  the  stream  mixer  provides  the  next  stream  pair 
to  the  correlation  coefficient  computing  program  203.  If  all 
desired  combinations  of  stream  pairs  are  examined,  then 
PRNG  quality  metric  205  is  computed.  The  PRNG  quality 
metric  may  be  computed  using,  in  various  embodiments,  an 
aggregation  method,  a  goodness-of-fit  method,  percentile 
method  or  mean  absolute  deviation  method.  In  some  embodi¬ 
ments,  the  method  for  computing  the  PRNG  quality  metric  is 
based  on  user  specification.  In  some  embodiments,  the  final 
output  (winch  may  be  a  p-value  in  statistics)  may  be  a  sig¬ 
nificance  level  above  which  the  claim  of  independence  of  the 
parallel  streams  cannot  be  rejected.  In  certain  embodiments, 
the  user  may  specify  a  significance  level,  and  the  quality 
metric  is  used  to  determine  if  the  PRNG  meets  the  user- 
specified  significance  level. 

FIG.  3  is  a  flow  diagram  illustrating  one  embodiment  of 
assessing  parallel  random  number  streams.  In  some  embodi¬ 
ments,  the  parallel  random  number  streams  are  generated  by 
a  random  number  generation  system  for  purposes  of  evaluat¬ 
ing  the  quality  of  the  random  number  generation  system.  This 
may  be  described  as  a  priori  or  offline  test.  In  other  embodi¬ 
ments,  the  quality  of  parallel  random  number  streams  gener¬ 
ated  on  demand  by  an  application  is  assessed  continually 
while  the  application  is  running.  This  may  be  described  as 
dynamic,  on-the-fly,  or  online  test. 

At  220,  parallel  random  number  streams  may  be  mixed  in 
one  or  more  ways  to  create  one  or  more  streams  of  bivariate 
pairs.  Mixing  the  parallel  random  number  streams  may 
include  pairing  the  random  number  streams  with  one  another. 
In  some  embodiments,  a  selection  of  a  mixing  method  to  be 
used  for  mixing  the  random  number  streams  is  received  from 
a  user. 

At  222,  an  inter-stream  correlation  value  may  be  computed 
for  each  mixed  stream  of  bivariate  pairs  based  on  a  correlation 
among  the  random  number  streams  used  to  create  the  mixed 
stream.  The  correlation  values  may  be,  for  example,  a  corre¬ 
lation  coefficient  computed  by  taking  several  (two  or  more) 
bivariate  pairs  from  the  mixed  stream.  The  number  of  bivari¬ 
ate  pairs  used  in  the  correlation  value  computation  may  be 
specified  by  the  user. 

At  224,  a  quality  metric  for  the  parallel  random  number 
streams  may  be  determined  from  inter-stream  correlation  val¬ 
ues  for  the  mixed  streams.  The  quality  metric  may  serve  as  a 
figure  of  merit  for  the  parallel  random  number  streams.  The 
quality  metric  may  provide  a  measure  of  the  independence  of 
the  parallel  number  streams  from  one  another.  In  some 
embodiments,  a  selection  of  a  testing  method  to  be  used  for 
computing  a  quality  metric  for  the  random  number  streams  is 
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received  from  a  user.  The  quality  metric  may  be  measured 
against  a  significance  level  specified  by  a  user. 

FIG.  4  is  a  flow  diagram  illustrating  one  embodiment  of 
assessing  a  random  number  stream.  In  some  embodiments, 
the  random  number  stream  is  generated  by  a  random  number  5 
generation  system  for  purposes  of  testing  the  random  number 
generation  system.  In  other  embodiments,  the  quality  of  the 
random  number  stream  is  assessed  during  consumption  of  the 
random  numbers  by  an  application  (online  test). 

At  240,  a  random  number  stream  is  segmented  into  random  to 
number  substreams.  In  one  embodiment,  the  random  number 
stream  is  segmented  using  a  leap-frog  method.  In  another 
embodiment,  the  random  number  stream  is  segmented  using 
a  cycle-division  method. 

At  242,  random  number  substreams  may  be  mixed  to  form  15 
substreams  of  bivariate  pairs.  Mixing  the  random  number 
substreams  may  include  pairing  the  random  number  sub¬ 
streams  with  one  another.  In  some  embodiments,  a  selection 
of  a  mixing  method  to  be  used  for  mixing  the  random  number 
substreams  is  received  from  a  user.  20 

At  244,  an  inter-stream  correlation  value  may  be  computed 
for  each  mixed  sub  stream  of  bivariate  pairs  based  on  a  corre¬ 
lation  between  the  substreams  used  to  create  the  mixed  sub¬ 
streams.  The  number  of  bivariate  pairs  (at  least  two)  used  in 
the  correlation  value  computation  may  be  specified  by  the  25 
user. 

At  246,  a  quality  metric  for  the  random  number  stream  may 
be  determined  from  inter-stream  correlation  values  for  the 
mixed  substreams.  The  quality  metric  may  serve  as  a  figure  of 
merit  for  the  random  number  stream.  The  quality  metric  may  3(j 
serve  as  a  figure  of  merit  for  the  parallel  random  number 
streams.  The  quality  metric  may  provide  a  measure  of  the 
independence  of  the  parallel  number  streams  from  one 
another.  In  some  embodiments,  a  selection  of  a  testing 
method  to  be  used  for  computing  a  quality  metric  for  the 
random  number  streams  is  received  from  a  user.  The  quality  35 
metric  may  be  measured  against  a  significance  level  specified 
by  a  user. 

In  some  embodiments,  inter-stream  correlations  are  quan¬ 
tified  among  multiple  parallel  random  number  (RN)  streams 
as  a  numerical  factor,  and  a  figure  of  merit  is  assigned  for  a  40 
PRNG.  In  one  embodiment,  a  system  includes  three  main 
components:  stream  mixer  202,  correlation  coefficient  calcu¬ 
lator  203,  and  PRNG  quality  metric  calculator  205. 

Let  us  consider  k,  where  k>2,  RN  streams  Sj ,  S2, . . . ,  for 
which  we  need  to  check  if  there  is  a  significant  inter-stream  45 
correlation  (ISC)  among  them.  To  compute  the  correlation, 
we  construct  a  bivariate  sample  (X,Y)  given  by  (x„  y,),  i=l, 

2,  ....  n.  (It  is  common  to  use  capitalized  letters  for  random 
variables  and  lower  case  letters  with  appropriate  subscripts 
for  the  observed  samples  corresponding  to  the  random  vari-  ^ 
ables.)  A  straight-forward  bivariate  sampling  takes  two  RN 
streams  at  a  time;  but  this  results  in 


*(*-!) 


possible  bivariate  samples,  in  which  each  bivariate  sample 
shares  one  of  the  streams  with2(k-2)  other  bivariate  samples, 


k 

2 

bivariate  samples,  in  which  no  streams  are  shared  among  the 
bivariate  samples.  If  k=l  0,000,  then  the  number  of  bivariate 


6 

samples  we  need  to  analyze  to  capture  all  possible  correla¬ 
tions  will  be  nearly  50  million.  To  reduce  the  computational 
complexity,  we  construct  k  or  fewer  bivariate  samples  in 
which  each  RN  stream  is  checked  for  correlation  with  one  or 
more  of  the  other  RN  streams.  This  is  explained  in  the  fol¬ 
lowing  steps. 

Step  1 .  Mix  the  RN  Streams  in  one  of  the  Following 
Ways 

Biased  Interleaving: 

Use  n  numbers  from  S3  as  the  n  observations  on  the  X 
variate,  and  interleave  the  remaining  k- 1  streams  to  provide  n 
observations  on  the  Y  variate. 

(An  alternative  approach  is  to  use  coarse  interleaving  of  the 
k-1  streams.  Let  n  be  a  large  multiple  of  (k-1).  Take  the  first 
n  RNs  from  S,  to  form  the  n  observations  on  X.  Take  first 

n 

FYJ 

RNs  from  S2,  the  second 


n 

FU 

RN s  from  S,  and  so  on  to  form  n  values  on  Y.  Extensive  testing 
showed  that  both  methods  of  interleaving  give  statistically 
similar  results.  The  first  approach  is  oblivious  to  the  total 
number  of  RNs  to  be  generated  by  each  stream,  which  may 
simplify  the  generation  and  storage  of  the  random  numbers.) 

This  gives  (x^y,),  i=l,  2,  .  .  .  ,  n,  with  Sj  as  the  selected 
stream.  This  can  be  repeated  with  S,.,  i=2,  .  .  .  ,  k,  as  the 
selected  stream  providing  X  values  and 


n 

RNs  from  each  of  the  other  k-1  streams  providing  Y  values. 
In  this  method,  each  (X,Y)  bivariate  sample  shares  (overlaps) 


n{k  -  2) 
k-l 

of  its  Y  values  with  each  of  the  other  bivariate  samples. 
Group  Interleaving: 

This  method  of  mixing  the  RN  streams  extends  the  concept 
of  biased  interleaving  to  form  bivariate  samples  with  no  over¬ 
lap,  which  may  be  desirable  for  statistical  test  methods .  In  this 
method,  the  given  k  RN  streams  are  grouped  into  groups  of  h 
streams  each,  where  2<h<k.  There  will  be  g  groups,  where 


Therefore,  group  interleaving  uses  gli  streams  for  correlation 
calculations.  (If  h  does  not  divide  k  evenly  without  any 
remainder,  then  gh<k<gh+h.)  Using  the  streams  in  each 
group,  a  bivariate  sample  is  formed  as  follows.  One  of  the 
streams  from  the  group  is  selected  to  provide  n  observations 
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of  the  X  variate.  The  remaining  h-1  streams  are  interleaved  to 
provide  n  values  for  the  Y  variate;  each  of  these  streams 
provides  up  to 


random  numbers.  (As  indicated  earlier,  fine  or  coarse  inter¬ 
leaving  may  be  used  to  interleave  the  h- 1  streams.)  This  gives 
g  bivariate  samples  each  with  n  observations.  There  is  no 
sharing  of  random  numbers  among  the  bivariate  samples. 

Shuffled  Interleaving: 

This  method  is  a  variation  of  group  interleaving,  obtained 
by  interleaving  all  streams  of  the  group  evenly  and  taking  the 
values  in  the  odd-numbered  positions  fomiing  the  X  variate 
and  values  in  the  even  numbered  positions  forming  the  Y 
variate.  Shuffled  interleaving  also  produces  g  different  (X,Y) 
stream  pairs  with  no  overlapping.  For  the  special  caseofh=k, 
there  is  only  one  group  resulting  in  only  one  (X,Y)  bivariate 
sample;  this  special  case  is  the  state  of  the  art  for  statistical 
testing  of  interstream  correlations. 

Pairwise  Interleaving: 

A  special  case  of  group  interleaving  (and  shuffled  inter¬ 
leaving)  is  the  pairwise  interleaving,  which  is  obtained  by 
choosing  h=2;  each  group  is  a  pair  of  streams.  Therefore, 
pairwise  interleaving  uses  n  RNs  from  stream  S,  as  the  n 
observations  of  the  X  variate  and  n  RNs  from  S2  as  the 
observations  of  the  Y  variate  from  the  first  group.  This  again 
gives  (x^y,),  i=l,  2, . . . ,  n.  This  can  be  repeated  to  obtain  up 
to 


zyf=r  sin  0  (4) 

The  correlation  coefficient  of  the  bivariate  normal  pairs 
(zx„  zy,),  i=l,  2, .  .  . ,  n,  is  computed. 

The  Box-Muller  transform  is  not  symmetric  in  the  sense 
that  switching  (X,Y)  ordering  yields  a  different  correlation 
coefficient  value.  In  particular,  Box-Muller  transform  is  sen¬ 
sitive  to  the  RN  streams  used  for  Y  variates  and  amplifies  the 
correlations  among  the  RN  streams  used  for  Y  variates  to 
calculate  different  0’s.  If  the  selected  stream  is  used  to  draw 
observations  for  X  and  the  interleaved  stream  is  used  to  draw 
observations  forY  with  biased  interleaving,  then  Box-Muller 
transform  correctly  amplifies  the  correlation  among  the  dif¬ 
ferent  versions  of  the  interleaved  streams  used  forY.  Any  pair 
of  interleaved  streams  formed  by  biased-interleaving  share 


additional  pairs  with  stream 


values,  and  the  quality  metric  computed  in  the  next  step  is 
dominated  by  the  correlation  among  the  interleaved  streams. 
,  -  To  avoid  this,  since  the  purpose  of  ISC  test  is  to  find  correla¬ 
tions  among  different  individual  streams,  the  interleaved 
stream  should  be  used  for  the  observations  of  X  and  the 
selected  stream  for  the  observations  of  Y  when  biased  inter¬ 
leaving  is  used  to  mix  RN  streams.  For  group,  shuffled,  and 
30  pairwise  interleaving  the  order  of  the  streams  is  not  an  issue 
since  all  streams  used  for  X  and  Y  variates  are  independent. 

Correlation  coefficients  from  several  pairs  of  streams  gen¬ 
erated  using  the  biased  interleaving  are  obtained.  Let  these 
coefficients  be  denoted  r15  r2,  .  .  .  ,  rk.  Each  r,  gives  the 
35  interstream  correlations  from  a  selected  stream  to  the  rest  of 
the  streams. 

If  group  or  shuffled  interleaving  is  used,  r1;  r2, . . . ,  r  ,  where 


s„  i  =  2,4 . 2 


providing  X  values  and  stream  S,+1  providing  Y  values. 
Step  2.  Calculate  Correlation  of  X,Y  Streams 


and  h  is  the  group  size,  are  the  interstream  correlations  with  r, 
representing  the  correlation  coefficient  between  streams  S,A, 
S,y,+1,  •  •  • ,  S,h+h_ j .  Forthe  special  case  of pairwise-interleav- 


Consider  a  pair  of  values  (x„y,),  i=l,  2,  ....  n,  taken  one 
each  from  the  two  streams.  If  the  RNs  are  integers  in  the  range 
[0,  m-1],  then  they  are  converted  to  reals  in  the  range  (0,1] 
using  the  conversion 


n,r2,„.  ,?jJ 


are  the  interstream  correlations,  where  r,  represents  the  cor- 


where  RN  is  an  integer  random  number.  If  the  RNs  are  from 
uniform  [0,  1),  then  they  are  converted  to  (0,  1]  range  using 
the  conversion  1-RN.  If  the  RNs  are  from  uniform  (0,1) 
distribution,  no  additional  preprocessing  is  needed.  Let  the 
resulting  random  variates  be  denoted  ux,  and  uy,.  The  Box- 
Muller  transform  given  by  the  following  equations  is  applied 
to  convert  RNs  to  normal  random  variates,  zx,  and  zy,.  (All 
logarithms  are  to  the  base  e.) 

r*=- 2  logUix.  j  (1 ) 


(Alternatively,  the  polar  transform  may  be  used  to  convert 
55  Oh,  y<)  pairs  to  normal  random  variate  pairs .  F irst,  x,  and  y,  are 
converted  to  reals  in  the  range  (- 1 , 1 ).  If  the  RNs  are  integers, 
they  can  be  converted  into  reals  in  the  range  (— 1 ,1 ).  If  the  RNs 
are  from  uniform  (0,1)  distribution,  then  the  numbers  are 
extended  to  (- 1 , 1 )  range.  Let  these  be  denoted  ux,.  and  uy,.  If 
60  ux,2+uy,2>l ,  the  (x„  y,)  pair  is  rejected  and  another  pair  from 
the  streams  is  chosen  and  tested  for  suitability.  This  is 
repeated  until  a  suitable  pair  is  found.  The  processed  values 
ux,  and  uy,  of  (x„  y,)  pair  that  is  found  suitable  are  used  to 
compute  the  corresponding  normal  random  variates  pair 
65  using  the  following  equations. 


Q=2jtuyi 


(2) 


(5) 
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An  alternative  expression  for  ir  in  terms  of  the  r,  is 


Y\a+nr-Y\a-^ 


(11) 


na+tf'  +  nd-tf 


zy  j  =  uyt 


-21og (s) 


(7) 


10  with 


Since  it  rejects  RN  pairs  that  are  simultaneously  too  large 
or  too  small,  ISC  testing  based  on  the  polar  transform  may 
result  in  the  underestimation  of  the  actual  inter-stream  corre¬ 
lations.  Therefore,  polar  transform  is  not  recommended  for 
ISC  testing  and  the  computation  of  PRNG  quality  metric. 
However,  the  polar  transform  may  be  used  to  reduce  the 
correlations  between  a  given  pair  of  RN  streams  by  removing 
RN  pairs  that  result  in  sal .) 


15 


20 


—  3 
1  N -3k' 


/=  1,2, 


For  the  case  of  equal  sample  size, 


Step  3 .  Compute  the  Overall  Interstream  Correlation  and  the  following  bias-corrected  transform 

Metric 

25 

-  -  r! 

The  sequence  of  r’s  obtained  in  the  previous  step  denote  zH  =zw 

2 n  -  - 


(12) 


k  |  or  -  if  pairwise- interleaving  is  usedj 


30 


may  be  used  to  estimate  p  by 


if  pairwise-interleaving  is  used)  estimates  of  the  actual  cor¬ 
relation  coefficient  p  among  the  streams  converted  using  the 
Box-Muller  transform.  The  RNG  quality  metric  may  be 
obtained  by  converting  the  r’s  to  normal  variates  using  Fish¬ 
er’s  z-transformation  and  using  one  of  the  following  correla¬ 
tion-coefficient  combining  methods  described  below. 

3.1.  Aggregation  Method 

Let  r„  i=l,  .  .  . ,  k,  be  a  correlation  coefficient  based  on  r^ 
bivariate  pairs.  In  the  present  disclosure,  nj=n2=. . .  =%=nLet 
N=kn. 


Define 


l  (i+n 

Z‘  =  2l0fe 


Let 


35 


40 


45 


(8) 


rH  =tanh  (ZH)  =  ■ 


e12"  -1 

e12H  +  1 


(13) 


We  can  use  the  stati  Stic  XH=ZHy/\i -3k  to  test  the  hypothesis : 
H0:  p=0.  Under  the  null  hypothesis  H0,  t^has  an  asymptotic 
standard  normal  distribution.  This  gives  a  significance  level 
above  which  the  null  hypothesis  cannot  be  rejected.  This 
significance  level  can  be  used  to  determine  the  quality  of  the 
PRNG. 

3.2.  Percentile  Method 

To  compute  the  quality  metric,  a  significance  level  a  is 
chosen  and  *h=r:  1-0.12  and  r;=ra/2  quantile  values  are  taken 
from  the  sorted  sequence  of  r’s.  The  Fisher’s  z-transforma¬ 
tion  given  by  the  following  equation  is  applied  to  both  quan¬ 
tiles  to  obtain  Zh  and  Zz. 


50 


0.51og 


Z;  =  - 


1+r/ 

1  -n 


(14) 


£(ni-3)Z; 


Z  (n.  -  3) 


(9)  55 


An  estimate  of  the  common  correlation  p  is 


rF  =  [anh(Z,v  )  = 


- 1 

e2Zw  +  1 


(10) 


The  quality  of  the  PRNG  is  given  by  the  significance  level 
at  which  ZA<2.33  and  Z^-2.33,  where  2.33  is  the  99th  per¬ 
centile  (0.99  quantile)  for  the  standard  normal  random  vari¬ 
able. 

Alternatively,  the  significance  level  for  the  selection  of  r 
quantiles  may  be  fixed  and  the  significance  level  at  which  Z/( 
and  Z;  satisfy  the  corresponding  Z-quantiles  may  be  taken  as 
a  PRNG  quality  metric. 

3.3.  Goodness-of-Fit  Method 

Kolmogorov-Smimov  (KS)  test  is  a  goodness-of-fit  test 
method  that  may  be  used  instead  of  the  aggregate  method  to 
determine  the  correlation  among  the  RN  streams  in  consid- 
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eration.  The  method  is  applied  as  follows.  Each  r;,  l<i<k,  is 
converted  to  standard  normal  variates  using  the  Fisher’s 
z-transform  described  above  and  sorted  in  ascending  order  to 
obtain  z„  i=l, . . . ,  k.  For  each  z„  the  corresponding  cumula¬ 
tive  probability,  fl,  is  computed.  If  r,.’s  are  normally  distrib-  5 
uted,  then  the  cumulative  probabilities  will  be  uniformly 
spaced  in  the  interval  [0,  1].  The  KS  test  statistic,  D,  the 
maximum  deviation  of  fl,  i=l,  .  .  .  ,  k  from  a  true  uniform 
distribution,  is  computed  as  follows. 


£>  =  MAxU 

Isis*  1 


i  —  1  i 

I-’  k 


-f) 


(15) 


If  D  is  below  the  critical  value  for  a  given  significance  level, 
then  the  hypothesis  that  r,’s  are  normally  distributed  cannot 
be  rejected  at  that  significance  level.  The  critical  values  for 
KS  test  precomputed  for  various  significance  levels  are  given 
in  most  standard  books  on  statistics.  20 

3.4.  Mean  Absolute  Deviation  Method 
Let  xq  be  the  q-quantile  value  in  the  sorted  sequence  of  r/s. 
Also,  let  xql,  xq2,  .  .  .  ,  xqm  be  m  r,’s  selected  at  quantiles 
ql, . .  . ,  qm,  from  this  sequence.  Using  Fisher’s  z-transform 
above,  the  corresponding  standard  normal  values  zql,  25 
zq2,  .  .  .  ,  zqm  are  computed.  From  these,  the  corresponding 
cumulative  probabilities  for  the  z  values  are  computed;  let 
they  be  f?1,  fq2,  .  .  .  ,  iqm.  The  mean  absolute  deviation  is 
computed  using  the  following  equation. 


I  fqi  -  qi] 
m 


(16) 
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There  is  no  critical  value  against  which  E  can  be  compared. 
The  lower  the  value  E,  the  better.  Though  KS  test  requires 
more  computations,  it  is  a  more  thorough  test  and  should  be 
preferred  to  the  mean  absolute  deviation  test.  On  the  other  40 
hand,  for  on-the-fly  testing  of  very  long  RN  streams,  the  mean 
absolute  deviation  method  may  be  more  practical  to  imple¬ 
ment. 

Application  of  ISC  Test  to  a  Single  Stream 

In  some  embodiments,  an  ISC  test  may  be  used  to  deter-  45 
mine  intra-stream  correlations  as  follows.  A  single  stream 
may  be  segmented  into  k  substreams  by  leap-frog  or  cycle- 
division  methods,  or  by  any  other  method.  In  the  leap-frog 
method,  substream  i,  1  <i<k  consists  of  RNs  inpositions  i,  k+i, 
2k+i  ...  of  the  stream.  In  the  cycle-division  method,  k  pair-  50 
wise  disjoint  subsets,  each  containing  n  consecutive  RNs  of 
the  original  single  RN  stream  are  picked.  An  ISC  test  can  be 
applied  on  the  substreams  to  obtain  the  quality  metric  as  in  the 
case  of  parallel  RN  streams.  In  this  case,  however,  the  ISC  test 
gives  the  quality  metric  based  on  the  intrastream  correlations.  55 
In  some  embodiments,  an  interstream  correlation  (ISC) 
test  evaluates  a  large  number  of  parallel  RN  streams  simulta¬ 
neously  and  provides  a  quality  metric.  The  ISC  test  may 
divide  the  total  streams  to  be  evaluated  into  subsets  of 
streams,  and  compute  a  correlation  coefficient  for  each  sub-  60 
set.  These  correlation  coefficients  may  be  combined  using  a 
theoretically  sound  test  method  such  as  the  Dotmer  and  Ros- 
ner  test  (DR  test)  or  Kolmogorov-Smimov  test  (KS  test),  and 
a  test  statistic  may  be  obtained.  If  the  test  statistic  is  higher 
than  a  suitably  determined  critical  value,  the  claim  of  inde-  65 
pendent  RN  streams  is  rejected.  A  lack  of  rejection  indicates 
that  the  RN  streams  are  likely  to  be  independent. 


In  some  embodiments,  an  interstream  correlation  test 
evaluates  correlations  among  a  large  number  of  RN  streams. 
Using  a  test  method  such  as  the  Dotmer  and  Rosner  test  or  the 
Kolmogorov-Smimov  test,  the  interstream  correlation  test 
may  provide  an  overall  PRNG  quality  metric.  In  some 
embodiments,  results  of  an  interstream  correlation  test  are 
used  in  conjunction  with  other  single-stream  test  batteries  and 
application-based  tests.  The  test  may  be  used  to  evaluate 
interstream  correlations  among  billions  of  RN  streams. 

In  an  embodiment,  an  interstream  correlation  test  evaluates 
the  correlations  among  a  large  number  of  subsets.  The  subsets 
may  be  interleaved  using  shuffled  or  biased  interleaving 
method.  As  one  example,  three  RN  streams  A,  B  and  C  may  be 
considered  with  RNs  a1;  a2,  a3, . . . ,  b1;  b,,  b3, . . . ,  and  c3,  c,, 
c3,  .  .  .  ,  respectively.  In  shuffled  interleaving  (also  called 
perfect  shuffle  interleaving),  a  new  stream  a1:  b1:  c1;  a,,  b,,  c,, 
a3, .  . .  is  created.  In  biased  interleaving,  a1;  b1;  a2,  c1:  a3,  b2, 
a4, . . .  is  created.  The  RNs  in  the  odd  numbered  positions  form 
the  X  variates  and  the  RNs  in  the  even  numbered  positions 
form  the  Y  variates  to  create  bivariate  pairs.  These  may  be 
transformed  into  bivariate  normal  pairs  using  Box-Muller 
transform.  Correlation  coefficient,  r,  for  the  bivariate  normal 
pairs  is  computed.  This  may  be  repeated  several  times  to 
obtain  multiple  r’s.  Collectively,  these  r’ s  are  the  samples  that 
can  be  used  to  estimate  p,  the  true  common  correlation  coef¬ 
ficient  among  the  parallel  RN  streams  generated  by  the 
PRNG  being  evaluated. 

The  r’s  may  be  combined  using  a  theoretically  sound  test 
method  such  as  Dotmer  and  Rosner  test  (DR-test)  or  Kolmog¬ 
orov-Smimov  test  (KS-test).  Based  on  the  test  data,  a  test 
statistic  may  be  obtained.  For  purposes  of  this  example,  the 
statistic  for  DR-test  is  denoted  as  W  and  the  statistic  for 
KS-test  as  Dmax.  For  each  test,  there  may  be  a  critical  value 
that  is  computed  based  on  the  desired  significance  level  and 
the  number  of  r’s  used.  For  example,  for  DR-test  at  a  signifi¬ 
cance  level  of  0.05,  the  critical  value  may  be  1 .96  provided  the 
number  of  bivariate  pairs  used  to  calculate  each  r  is  large  and 
the  number  of  r’s  is  more  than  2.  If  test  statistic  is  above  the 
critical  value,  then  the  RN  streams  generated  by  the  PRNG 
are  likely  to  have  significant  interstream  correlations. 

In  this  example,  the  DR-test  combines  the  r’s  and  gives  the 
test  statistic  tH,  which  is  a  standard  nomial  variate.  This  can  be 
used  to  test  the  null  hypothesis  Ho:p=0.  Large  absolute  values 
of  XH  will  lead  to  the  rejection  of  the  null  hypothesis  and  the 
acceptance  of  the  alternative  hypothesis  H^p^O.  For  the  sig¬ 
nificance  level  a=0.05,  absolute  values  of  t^  above  1 .96  lead 
to  the  rejection  of  the  claim  that  parallel  RN  streams  are 
independent.  The  probability  that  the  rejection  is  erroneous  is 
a=0.05.  One  could  use  different  significance  levels:  for 
a=0.02,  the  absolute  values  of  tH  above  2.33  will  lead  to 
rejection  of  the  claim  of  independence  of  RN  streams  with 
only  0.02  probability  of  being  wrong. 

The  distribution  of  r’s  may  be  approximately  normal. 
These  r’s  can  be  converted  into  standard  normal  variates 
using  sample  variance  of  r’s,  testing  for  p=0.  The  KS  test  may 
be  applied  on  the  distribution  of  r’s.  In  this  case,  the  KS-test 
statistic,  Dmax,  computed  using  the  r’s  is  to  be  less  than  the 
critical  value,  D„  „,  for  significance  level  a  and  n,  the  number 
of  r’s  used.  For  KS-test,  at  a  significance  level  of  0.01,  the 
critical  value  may  be  0.0274  when  the  number  of  r’s  used  is 
1500. 

In  some  embodiments,  r’s  may  be  combined  using  other 
computationally  more  complex  tests  such  as  Anderson-Dar¬ 
ling  or  Shapiro-Wilk  tests. 

In  some  embodiments,  r’s  may  be  combined  using  compu¬ 
tationally  simpler  tests  such  as  the  percentile  method  and 
mean  absolute  deviation  method.  The  simpler  methods  may 
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be  preferred  for  online  tests  to  reduce  the  use  of  computing 
resources  used  for  quality  metric  computations,  whereas  the 
more  complex  methods  may  be  preferred  for  offline  tests. 

Systems  and  methods  described  herein  may  be  used  in  a 
variety  of  applications.  Examples  of  applications  for  systems 
and  methods  as  described  herein  include  (a)  simulation-based 
solutions  to  large  scientific  and  engineering  problems,  (b) 
parameterized  Monte  Carlo  simulations  of  scientific,  engi¬ 
neering,  and  finance  problems,  (c)  distributed  computing,  and 
(d)  protocols  and  keys  used  for  information  assurance  and 
security. 

Systems  and  methods  described  herein,  such  as  the  ISC 
tester  described  above  relative  to  FIG.  1,  may  be  implemented 
in  hardware  including  field  programmable  gate  arrays  (FP- 
GAs)  and  application  specific  integrated  circuit  (ASIC)  chips, 
or  a  suitable  combination  of  hardware  and  software  and 
which  can  be  one  or  more  software  systems  on  a  general 
purpose  processor  (CPU)  or  graphics  processing  unit  (GPU). 

Computer  systems  may,  in  various  embodiments,  include 
components  such  as  a  CPU  with  an  associated  memory 
medium  such  as  Compact  Disc  Read-Only  Memory  (CD- 
ROM).  The  memory  medium  may  store  program  instructions 
for  computer  programs.  The  program  instructions  may  be 
executable  by  the  CPU.  Computer  systems  may  further 
include  a  display  device  such  as  monitor,  an  alphanumeric 
input  device  such  as  keyboard,  a  directional  input  device  such 
as  mouse,  a  voice  recognition  system  to  dictate  text  and  issue 
commands  for  processing,  and  a  touch  screen  that  may  serve 
as  a  keyboard  or  mouse.  Computer  systems  may  be  operable 
to  execute  the  computer  programs  to  implement  computer- 
implemented  systems  and  methods.  A  computer  system  may 
allow  access  to  users  by  way  of  any  browser  or  operating 
system. 

Embodiments  of  a  subset  or  all  (and  portions  or  all)  of  the 
above  may  be  implemented  by  program  instructions  stored  in 
a  memory  medium  or  carrier  medium  and  executed  by  a 
processor.  A  memory  medium  may  include  any  of  various 
types  of  memory  devices  or  storage  devices.  The  tenn 
“memory  medium”  is  intended  to  include  an  installation 
medium,  e.g.,  a  Compact  Disc  Read  Only  Memory  (CD- 
ROM),  floppy  disks,  or  tape  device;  a  computer  system 
memory  or  random  access  memory  such  as  Dynamic  Ran¬ 
dom  Access  Memory  (DRAM),  Double  Data  Rate  Random 
Access  Memory  (DDR  RAM),  Static  Random  Access 
Memory  (SRAM),  Extended  Data  Out  Random  Access 
Memory  (EDO  RAM),  Rambus  Random  Access  Memory 
(RAM),  etc.;  or  a  non-volatile  memory  such  as  a  magnetic 
media,  e.g.,  a  hard  drive  (which  may  be  a  disk  or  solid  state), 
or  optical  storage.  The  memory  medium  may  comprise  other 
types  of  memory  as  well,  or  combinations  thereof.  In  addi¬ 
tion,  the  memory  medium  may  be  located  in  a  first  computer 
in  which  the  programs  are  executed,  or  may  be  located  in  a 
second  different  computer  that  connects  to  the  first  computer 
over  a  network,  such  as  the  Internet.  In  the  latter  instance,  the 
second  computer  may  provide  program  instructions  to  the 
first  computer  for  execution.  The  term  “memory  medium” 
may  include  two  or  more  memory  mediums  that  may  reside  in 
different  locations,  e.g.,  in  different  computers  that  are  con¬ 
nected  over  a  network.  In  some  embodiments,  a  computer 
system  at  a  respective  participant  location  may  include  a 
memory  medium) s)  on  which  one  or  more  computer  pro¬ 
grams  or  software  components  according  to  one  embodiment 
may  be  stored.  For  example,  the  memory  medium  may  store 
one  or  more  programs  that  are  executable  to  perform  the 
methods  described  herein.  The  memory  medium  may  also 
store  operating  system  software,  as  well  as  other  software  for 
operation  of  the  computer  system. 
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The  memory  medium  may  store  a  software  program  or 
programs  operable  to  implement  embodiments  as  described 
herein.  The  software  program(s)  may  be  implemented  in  vari¬ 
ous  ways,  including,  but  not  limited  to,  procedure-based  tech- 
5  niques,  component-based  techniques,  and/or  object-oriented 
techniques,  among  others.  For  example,  the  software  pro¬ 
grams  may  be  implemented  using  ActiveX  controls,  C++ 
objects,  as  a  library  or  standalone  programs  in  a  programming 
language  such  as  C,  C++,  Java  or  in  a  scripting  language  such 
111  as  Bash,  Perl,  Python,  or  AWK,  JavaBeans,  Microsoft  Foun¬ 
dation  Classes  (MFC),  browser-based  applications  (e.g.,  Java 
applets),  traditional  programs,  or  other  technologies  or  meth¬ 
odologies,  as  desired.  A  CPU  executing  code  and  data  from 
15  the  memory  medium  may  include  a  means  for  creating  and 
executing  the  software  program  or  programs  according  to  the 
embodiments  described  herein. 

The  ISC  Tester  may  be  embedded  in  an  application  or  may 
be  combined  with  a  random  number  generator. 

20  Furthermodifications  and  alternative  embodiments  ofvari- 
ous  aspects  of  the  invention  may  be  apparent  to  those  skilled 
in  the  art  in  view  of  this  description.  Accordingly,  this 
description  is  to  be  construed  as  illustrative  only  and  is  for  the 
purpose  of  teaching  those  skilled  in  the  art  the  general  manner 
25  of  carrying  out  the  invention.  It  is  to  be  understood  that  the 
forms  of  the  invention  shown  and  described  herein  are  to  be 
taken  as  embodiments.  Elements  and  materials  may  be  sub¬ 
stituted  for  those  illustrated  and  described  herein,  parts  and 
processes  may  be  reversed,  and  certain  features  of  the  inven- 
30  tion  may  be  utilized  independently,  all  as  would  be  apparent 
to  one  skilled  in  the  art  after  having  the  benefit  of  this  descrip¬ 
tion  of  the  invention.  Methods  may  be  implemented  manu¬ 
ally.  in  software,  in  hardware,  or  a  combination  thereof.  The 
order  of  any  method  may  be  changed,  and  various  elements 
35  may  be  added,  reordered,  combined,  omitted,  modified,  etc. 
Changes  may  be  made  in  the  elements  described  herein  with¬ 
out  departing  from  the  spirit  and  scope  of  the  invention  as 
described  in  the  following  claims, 

40  What  is  claimed  is: 

1 .  A  method  of  assessing  parallel  random  number  streams, 
comprising: 

creating  mixed  random  number  streams  by  mixing  two  or 
more  parallel  random  number  streams,  wherein  mixing 
45  the  two  or  more  parallel  random  number  streams  com¬ 
prises  pairing  at  least  one  of  the  random  number  streams 
with  at  least  one  other  of  the  random  number  streams; 

computing,  by  a  computer  system,  for  each  of  the  mixed 
random  number  streams,  an  inter-stream  correlation 
50  value  based  on  a  correlation  between  the  bivariate  pairs 
constructed  from  the  mixed  stream;  and 

determining,  from  inter-stream  correlation  values  for  two 
or  more  mixed  random  number  streams,  a  quality  metric 
for  the  parallel  random  number  streams. 

55  2.  The  method  of  claim  1,  wherein  determining  the  quality 

metric  comprises  off-line  testing  of  the  two  or  more  parallel 
random  number  streams,  wherein  the  two  or  more  parallel 
random  number  streams  are  generated  by  a  random  number 
generation  system  for  purposes  of  testing  the  random  number 
60  generation  system. 

3,  The  method  of  claim  1,  wherein  determining  the  quality 
metric  comprises  on-line  testing  of  the  two  or  more  parallel 
random  number  streams  during  consumption  of  the  random 
numbers  by  an  application. 

65  4.  The  method  of  claim  1,  wherein  determining  the  quality 

metric  comprises  combining  inter-stream  correlation  values 
for  at  least  two  random  number  streams. 
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5.  The  method  of  claim  1,  wherein  mixing  two  or  more 
parallel  random  number  streams  comprises  receiving  a  user 
selection  of  a  mixing  approach. 

6.  The  method  of  claim  1,  wherein  the  set  of  all  streams 

may  be  mixed.  5 

7.  The  method  of  claim  1,  wherein  the  set  of  all  streams 
may  be  grouped  into  subsets. 

8.  The  method  of  claim  1,  wherein  mixing  a  set  or  subset  of 
three  or  more  parallel  random  number  streams  comprises 
biased  interleaving  of  a  stream  with  the  remaining  streams  in  to 
the  set  or  subset. 

9.  The  method  of  claim  1,  wherein  mixing  a  set  or  subset  of 
two  or  more  parallel  random  number  streams  comprises 
shuffled  interleaving  of  all  streams  in  the  set  or  subset. 

10.  The  method  of  claim  1,  wherein  mixing  a  set  or  subset  15 
of  two  parallel  random  number  streams  comprises  pair-wise 
interleaving  of  at  the  two  streams. 

11 .  The  method  of  claim  1,  wherein  determining  the  quality 
metric  comprises  receiving  a  user  selection  of  a  test  method. 

12.  The  method  of  claim  1,  wherein  the  quality  metric  20 
comprises  a  significance  level,  wherein  the  significance  level 
comprises  a  level  above  which  a  claim  of  independence  can¬ 
not  be  rejected. 

13.  The  method  of  claim  1,  wherein  the  quality  metric  is 

tested  against  a  user-specified  significance  level.  25 

14.  The  method  of  claim  1,  wherein  the  quality  metric  is 
determined  based  on  an  aggregate  method. 

15.  The  method  of  claim  1,  wherein  the  quality  metric  is 
determined  based  on  a  goodness-of-fit  method. 

16.  The  method  of  claim  1,  wherein  the  quality  metric  is  30 
determined  based  on  a  percentile  method. 

17.  The  method  of  claim  1,  wherein  the  quality  metric  is 
determined  based  on  a  mean  absolute  deviation  method. 

18.  The  method  of  claim  1,  further  comprising  applying  a 
polar  transform  to  remove  some  bivariate  pairs  from  a  mixed  35 
random  number  stream  from  the  determination  of  the  quality 
metric,  wherein  removing  the  one  or  more  bivariate  pairs 
reduces  correlations  among  the  random  number  streams  used 

in  creating  the  mixed  random  number  stream. 
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19.  The  method  of  claim  1,  further  comprising  determining 
whether  the  quality  metric  for  the  two  or  more  parallel  ran¬ 
dom  number  streams  meets  a  user-specified  significance 
level. 

20.  A  system,  comprising: 

a  processor; 

a  memory  coupled  to  the  processor,  wherein  the  memory 
comprises  program  instructions  executable  by  the  pro¬ 
cessor  to  implement: 

creating  mixed  random  number  streams  by  mixing  two  or 
more  parallel  random  number  streams,  wherein  mixing 
the  two  or  more  parallel  random  number  streams  com¬ 
prises  pairing  at  least  one  of  the  random  number  streams 
with  at  least  one  other  of  the  random  number  streams; 

computing,  for  each  of  the  mixed  random  number  streams, 
an  inter-stream  correlation  value  based  on  a  correlation 
between  the  bivariate  pairs  constructed  from  the  mixed 
stream;  and 

determining,  from  inter-stream  correlation  values  for  two 
or  more  mixed  random  number  streams,  a  quality  metric 
for  the  parallel  random  number  streams. 

21.  A  non-transitory,  computer-readable  storage  medium 
comprising  program  instructions  stored  thereon,  wherein  the 
program  instructions  are  configured  to  implement: 

creating  mixed  random  number  streams  by  mixing  two  or 
more  parallel  random  number  streams,  wherein  mixing 
the  two  or  more  parallel  random  number  streams  com¬ 
prises  pairing  at  least  one  of  the  random  number  streams 
with  at  least  one  other  of  the  random  number  streams; 

computing,  for  each  of  the  mixed  random  number  streams, 
an  inter-stream  correlation  value  based  on  a  correlation 
between  the  bivariate  pairs  constructed  from  the  mixed 
stream;  and 

determining,  from  inter-stream  correlation  values  for  two 
or  more  mixed  random  number  streams,  a  quality  metric 
for  the  parallel  random  number  streams. 
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PRIORITY  CLAIM 


[0001]  This  application  claims  priority  to  U.S.  Provisional  Application 
No.  61/454,856  entitled  “GENERATION  OF  DISTINCT  PSEUDORANDOM 
5  NUMBER  STREAMS  BASED  ON  PROGRAM  CONTEXT”  to  Boppana  filed 
March  21,  201 1,  which  is  incorporated  herein  by  reference  in  its  entirety. 


Field 


BACKGROUND 


10  [0002]  This  disclosure  is  generally  related  to  parallel  computing 

applications,  simulation  codes  and  protocols  that  use  pseudorandom  numbers  and 
more  specifically  to  algorithms  and  methods  to  generate  pseudorandom  numbers. 


Description  of  the  Related  Art 


15  [0003]  Many  important  scientific  computing  applications,  business  and 

finance  applications,  and  complex  systems  modeling  and  analysis  techniques  use 
pseudorandom  number  generators  (“RNGs”).  These  applications  may  take  advantage 
of  the  availability  of  thousands  of  computing  cores  on  heterogeneous  systems 
comprising  multi-core  processors  (“CPUs”)  and  highly  parallel  general  purpose 
20  graphics  processing  units  (“GPUs”),  provided  that  suitable  parallel  pseudorandom 
number  generators  (“PRNGs”)  are  available  to  simultaneously  feed  thousands  of 
computing  streams  with  high  quality  random  number  (“RN”)  streams  with  low  intra- 
and  inter-stream  correlations  (inter-stream  correlations  may  be  referred  to  herein  as 
“ISCs”). 

25 

[ 0004  ]  A  parallel  or  distributed  application  has  the  computational  task  that 
may  be  divided  into  several  thousands  or  millions  of  subtasks,  with  each  subtask 
executed  by  a  separate  thread  or  process  (henceforth,  process).  Each  process  has 
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distinct  ID  that  is  usually  logically  numbered  within  the  context  of  the  application 
execution. 

[0005]  For  an  iterative  parallel  application,  each  process  may  execute 
some  of  the  iterations.  For  example,  for  a  large  lattice  structure  simulation,  each 
5  process  may  simulate  the  working  of  a  few  of  the  lattice  points.  Therefore,  processes 

often  cycle  through  computing  and  communication  mode.  In  the  computing  mode,  a 
process  may  use  the  available  data  to  perform  new  calculations  needed  to  make 
progress  toward  the  solution.  In  the  communication  mode,  a  process  may  send  its  data 
or  receive  other  process’  data. 

10  [0006]  It  is  common  to  use  the  single -program  multiple  data  (SPMD) 

programming  method  to  code  parallel  applications,  in  which  each  of  the  processes 
receives  the  same  computer  code  but  has  explicit  instructions  that  specify  based  on 
the  process’s  ID  its  portion  of  the  task. 

[0007]  If  an  SPMD-based  parallel  application  code  that  uses  random 
15  numbers  is  executed,  all  or  some  of  the  processes  (spawned  for  the  execution  of  the 

application  code)  request  random  numbers  from  the  same  program  locations  or 
contexts. 

[0008]  In  some  applications,  all  required  processes  may  be  spawned 
statically  at  the  start  of  the  code  execution.  In  other  applications,  some  of  the 
20  processes  are  spawned  initially  and  any  additional  processes  are  spawned  dynamically 
by  the  existing  processes  based  on  the  application  data  and  the  coded  algorithm  or 
model.  In  highly  complex  simulation  codes,  the  initial  processes  may  need  to  spawn 
additional  processes,  dynamically,  during  the  execution.  However,  with  SPMD 
programming  method,  all  processes  use  the  same  application  code  with  the  task  for 
25  each  process  specified  by  conditional  statements  based  on  the  data  and  the  process  ID. 

[0009]  In  some  systems,  to  distinguish  requests  for  random  numbers  from 
different  processes,  an  application  is  coded  such  that  each  process  uses  a  RN  stream 
identifier  to  explicitly  identify  a  distinct  stream  allocated  to  it.  The  stream  allocated  to 
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a  process  may  be  initialized  by  a  special  function  call  prior  to  generating  or  using  any 
RNs  from  that  stream. 

[0010]  A  large  application  code  that  uses  RNs  may  be  executed  by 
dividing  the  computing  task  among  multiple  processes.  Typically,  each  process  is 
5  allocated  at  least  one  distinct  RN  stream  to  provide  the  RNs  needed  during  its 

computations.  To  improve  randomness  and  to  improve  the  reproducibility  of  results, 
an  application  may  be  coded  such  that  each  portion  of  computing  workload,  for 
example,  each  small  subset  of  the  iterations  of  a  large  iterative  code,  may  be  assigned 
a  distinct  RN  stream  identifier  so  that  each  workload  may  use  a  distinct  RN  stream  for 
10  the  necessary  RNs  in  its  execution.  In  such  cases,  especially  for  efficiency  reasons, 
each  process  may  be  assigned  one  or  more  of  the  computing  workloads,  and  thus,  one 
or  more  of  the  distinct  RN  stream  identifiers.  It  is  computationally  inefficient,  hard  to 
reproduce  results,  or  both  to  code  an  application  so  that  an  RN  stream  is  shared  by 
multiple  processes. 

15  [0011]  The  RN  streams  to  processes  may  be  allocated  based  on  the  input 

data  and/or  computations  allocated  to  them.  For  example,  if  a  computational  loop  is 
partitioned  cyclically  among  p  processes,  then  iteration  i  may  be  executed  by 
process  i%p;  if  each  iteration  is  to  use  a  separate  RN  stream,  then  the  number  of 
iterations  is  smaller  than  the  maximum  of  RN  streams  and  it  may  be  natural  to 
20  allocate  RN  streams  i,i  +  /;,...  from  the  set  of  all  RN  streams  to  process  i . 

[0012]  One  way  to  ensure  that  distinct  RN  streams  are  used  is  to  allocate 
distinct  RN  stream  identifiers  and  to  use  a  PRNG  that  ensures  that  distinct  RN  stream 
identifiers  result  in  initialization  of  distinct  RN  streams,  which  for  a  well-designed 
PRNG,  may  have  low  or  undetectable — based  on  the  currently  available  statistical  and 
25  other  tests — interstream  correlations. 

[0013]  If  the  application  requires  each  process  or  computational  workload 
to  request  random  numbers  from  multiple  program  locations  or  contexts,  then  there 
may  be  two  options.  One  option  is  to  use  the  same  RN  stream  for  all  contexts  within  a 
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process.  The  same  contexts  in  two  different  processes  will  still  use  distinct  RN 
streams  provided  distinct  stream  identifiers  are  allocated  and  initialized  for  different 
processes. 

[0014]  A  second  option  is  to  use  multiple  distinct  streams  for  multiple 
5  contexts  in  each  process,  potentially  one  distinct  RN  stream  for  each  distinct  program 

context.  This  second  option  may  be  desirable  for  better  randomness  properties.  In 
such  a  case,  the  application  code  is  explicitly  written  to  manage  these  multiple 
streams.  If  the  number  of  distinct  streams  needed  for  an  application  is  not  known  in 
advance,  the  maximum  number  of  streams  needed  per  process  is  estimated  and  the 
10  same  are  allocated  to  each  process. 

[0015]  If  the  estimation  is  too  small,  then  a  program  error  is  generated  and 
execution  is  halted.  In  this  case,  the  user  needs  to  revise  the  estimate  for  the  number 
of  streams  needed  and  resubmit  the  application  for  execution. 

[0016]  If  the  estimation  is  too  large,  then  the  program  may  run  out  of 
15  distinct  RN  streams  for  processes  spawned  after  some  point.  This  is  especially  true  for 

parallel  applications  that  are  tuned  and  run  on  large  clusters  of  computers  with  a  large 
number  of  processes  are  run  on  even  larger  clusters  of  computers  with  even  more 
processes,  by  a  simple  change  in  compile-time  or  runtime  options  without  application 
recoding,  to  take  advantage  of  the  additional  performance  offered  by  the  larger 
20  hardware. 

[0017]  To  further  control  the  generation  of  RN  streams,  an  application 
may  provide  a  single-seed  value,  typically  by  a  designated  master  process  (usually 
process  0)  to  a  PRNG.  The  single-seed  value  is  typically  a  32-  or  64-bit  number,  often 
an  integer,  specified  by  the  user  as  part  of  the  application’s  input  data.  By  keeping  all 
25  other  input  data  the  same  and  changing  only  the  seed  value,  the  user  can  run  multiple 

instances  of  the  same  scenario,  average  the  results  and  obtain  potential  simulation 
error  estimations  (also  called,  confidence  intervals  in  statistics). 
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[0018]  The  quality  of  the  random  numbers  used  may  be  crucial  for  quick 
and  accurate  solutions  to  simulation-based  computer  solutions  and  for  robust  security 
protocols  and  security  keys  used  in  security  protocols.  It  may  be  desirable  to  use 
distinct  parallel  RN  streams  if  an  application  code  calls  for  RNs  from  multiple  distinct 
5  locations  so  that,  within  a  process,  multiple  calls  for  RNs  from  the  same  location  (also 

called,  program  context)  are  satisfied  by  providing  RNs  from  a  specific  stream,  while 
the  calls  for  RNs  from  different  locations  of  the  program  within  the  same  computing 
iteration  will  be  satisfied  by  providing  RNs  from  different  streams.  Distinct  RN 
streams  across  different  processes  may  be  ensured  by  the  use  of  distinct  RN  stream 
10  identifiers  to  initialize  the  RN  streams.  To  use  distinct  RN  streams  for  distinct 
contexts  within  a  process  or  computational  workload,  the  application  has  to  be  coded 
specifically  to  use  distinct  RN  stream  identifiers  for  each  such  program  context.  Such 
an  approach  may,  however,  provide  an  unreasonable  burden  on  the  application 
designer  and  make  revisions  to  application  code,  which  may  change  the  number  of 
15  program  contexts  from  which  RNs  are  requested,  cumbersome  and  potentially  error- 

prone. 


[0019]  In  some  parameterized  PRNGs,  each  process  is  given  one  RN 
stream  with  appropriately  parameterized  seed  or  iteration  function.  Two  main 
approaches  to  design  PRNGs  are  (a)  splitting  a  sequential  RN  stream  into  multiple 
20  substreams,  with  each  substream  treated  as  a  distinct  RN  stream  for  application 

execution  purposes,  and  (b)  parameterization  of  the  initialization  (seed)  state  of  an 
RNG  with  multiple  random  number  cycles  or  the  parameterization  of  the  iteration 
function  of  the  initialization  of  an  RNG.  The  leap-frog  technique  which  splits  a 
sequential  RN  stream  in  an  interleaved  manner  —  if  a  sequential  stream  consisting  of 
25  xi,  X2,  X3,  ...  needs  to  be  split  into  k  streams,  then  stream  i  consists  of  RNs  Xj,  Xk+i, 

x2k+i,  . ..,  1  <  i  <  k  — received  extensive  attention.  But  it  is  inherently  not  scalable 
owing  to  initialization  cost — a  large  multiple  of  k  RNs  must  be  generated  first  to 
initialize  each  processor/process — and  potentially  increased  intra-stream  correlations. 
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[0020]  The  Mersenne  twister  (MT)  is  a  variant  of  feedback  shift  register- 
based  random  number  generator.  The  original  generator  MT19937,  which  generates  a 

single  RN  stream  with  a  very  long  cycle  of  length  2  (that  is,  the  sequence  of 
RNs  repeats  after  generating  this  many  RNs),  is  very  popular  and  is  widely 
implemented  in  various  software  packages  (including  Gnu  Scientific  Library,  gsl 
package).  SFMT19937,  a  parallel  128-bit  version,  and  MTGP,  a  GPU  version  as  part 
of  NVIDIA  CUD  A  library,  are  also  available.  Using  MT  to  generate  multiple  parallel 
RN  streams  often  requires  splitting  its  sequential  RN  stream.  This  is  largely  an  ad  hoc 
process  since  the  maximum  number  of  RNs  needed  in  each  segment  needs  to  be 
estimated.  This  also  may  compromise  the  randomness  quality  since  segmenting  the 
stream  and  using  the  segments  changes  the  correlations  among  the  RNs  used.  Direct 
parallelization  by  changing  the  parameters  of  MT  is  computationally  expensive  and 
may  not  be  suitable  for  dynamic  generation  of  random  number  streams  in  a  high- 
performance  simulation  code. 
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SUMMARY 


[0021]  In  an  embodiment,  a  method  of  providing  random  number  streams 
to  a  process  includes  determining  one  or  more  program  contexts  within  a  process. 

5  Each  of  the  program  contexts  may  include  code  that  calls  for  one  or  more  random 

numbers.  For  each  of  at  least  two  of  the  program  contexts,  a  random  number  stream 
is  provided  to  the  process.  The  random  number  stream  for  each  program  context  is 
based  on  the  determined  program  context  and  is  distinct  from  the  random  number 
stream  for  the  other  program  contexts  in  the  process. 

10  [0022]  In  an  embodiment,  a  method  of  providing  random  numbers  streams 

to  processes  performing  a  parallel  computation  includes  determining  program 
contexts  within  one  process  of  a  parallel  computation.  Each  of  the  program  contexts 
may  include  code  that  calls  for  one  or  more  random  numbers.  A  random  number 
stream  is  provided  to  the  process  for  each  of  the  program  contexts.  The  random 

15  number  stream  provided  is  based  in  part  on  the  determined  program  context  and 

based  in  part  on  which  of  the  two  or  more  processes  the  program  context  is  in. 

[0023]  In  an  embodiment,  a  method  of  providing  random  numbers  streams 
to  processes  performing  a  parallel  computation  includes  receiving  a  call  for  one  or 
more  random  numbers  from  a  program  context  in  a  process  of  a  parallel  computation. 

20  A  random  number  stream  is  used  to  provide  a  random  number  for  each  such  call.  The 

random  number  stream  provided  is  based  at  least  in  part  on  the  determined  program 
context. 


[0024]  In  some  embodiments,  a  context-aware  parallel  pseudorandom 
number  generator  uses  the  program  context  in  which  a  request  for  a  random  number  is 
25  made  to  automatically  select  and  use  distinct  random  number  streams  for  distinct 

contexts. 
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BRIEF  DESCRIPTION  OF  THE  DRAWINGS 


[0025]  FIG.  1  is  a  block  diagram  illustrating  a  random  number  generator 
that  provides  distinct  random  number  streams  to  different  program  contexts  of  a 
parallel  computation. 

[0026]  FIG.  2  is  a  block  diagram  illustrating  a  random  number  generator 
that  can  provide  distinct  random  number  streams  to  different  program  contexts  and 
different  processes  of  a  parallel  computation  based  on  program  context  and  other 
information. 

[0027]  FIG.  3  illustrates  providing  random  number  streams  to  a  process 
based  on  a  determined  program  context. 

[0028]  FIG.  4  illustrates  one  embodiment  of  the  initialization  process  by  a 
context-aware  random  number  generator. 

[0029]  While  the  invention  is  described  herein  by  way  of  example  for 
several  embodiments  and  illustrative  drawings,  those  skilled  in  the  art  will  recognize 
that  the  invention  is  not  limited  to  the  embodiments  or  drawings  described.  It  should 
be  understood,  that  the  drawings  and  detailed  description  thereto  are  not  intended  to 
limit  the  invention  to  the  particular  form  disclosed,  but  on  the  contrary,  the  intention 
is  to  cover  all  modifications,  equivalents  and  alternatives  falling  within  the  spirit  and 
scope  of  the  present  invention  as  defined  by  the  appended  claims.  The  headings  used 
herein  are  for  organizational  purposes  only  and  are  not  meant  to  be  used  to  limit  the 
scope  of  the  description  or  the  claims.  As  used  throughout  this  application,  the  word 
"may"  is  used  in  a  permissive  sense  (i.e.,  meaning  having  the  potential  to),  rather  than 
the  mandatory  sense  (i.e.,  meaning  must).  Similarly,  the  words  “include”, 
“including”,  and  “includes”  mean  including,  but  not  limited  to. 


Atty.  Dkt.  No.:  5660-14400 


Page  8 


Meyertons,  Hood,  Kivlin,  Kowert  &  Goetzel,  P.C. 


DETAILED  DESCRIPTION  OF  EMBODIMENTS 


[0030]  As  used  herein,  “random  number”  includes  a  pseudorandom 
number.  As  used  herein,  a  “random  number  generator”  includes  a  pseudorandom 
5  number  generator. 

[0031]  As  used  herein,  a  “context-aware  parallel  pseudorandom  number 
generator”  means  a  parallel  pseudorandom  number  generator  which  generates  one  or 
more  random  number  streams  and  provides  random  numbers  based  on  information 
relating  to  a  program  context  for  requesting  random  numbers. 

10  [0032]  As  used  herein,  the  phrase  “primitive  process”,  or  simply 

“process”,  is  used  to  represent  a  thread  or  process  assigned  to  execute  one 
computational  workload.  In  some  cases,  a  thread  or  process  used  in  an  execution  of 
the  application  may  perform  the  work  of  multiple  primitive  processes. 

[0033]  In  some  embodiments,  distinct  random  number  streams  are 
15  assigned  to  different  program  contexts.  The  streams  may  be  assigned  such  that  no 

two  processes  cooperatively  working  on  a  parallel  computation  use  the  same  random 
number  stream.  In  some  embodiments,  the  use  of  program  context  enables  context- 
aware  parallel  pseudorandom  number  generators  to  generate  distinct  random  number 
streams  even  for  processes  that  use  only  one  stream  identifier  by  call  for  random 
20  numbers  from  multiple  locations. 

[0034]  In  some  embodiments,  a  collection  of  random  number  streams  is 
given  to  each  process  so  that  each  distinct  statement  (denoted,  random  number 
context)  that  calls  for  a  random  number  is  served  with  a  distinct  generator  taken  from 
the  PRNGs  assigned  to  that  process.  To  ensure  that  each  process  of  the  parallel 
25  computation  that  executes  the  same  code  uses  distinct  random  number  streams,  the 

streams  may,  in  certain  embodiments,  be  further  initialized  with  distinct  RN  stream 
identifiers  supplied  by  the  application  code.  This  RN  stream  identifier  may  be  used  to 
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determine  a  distinct  identifier,  in  64-  or  more  bits,  generated  by  a  special  library 
module. 


[0035]  In  some  embodiments,  random  number  context  (RN-context)  is 
used  in  conjunction  with  the  RN  stream  identifier  to  determine  the  RN  stream  to  be 
5  used.  The  RN  context  may  be  derived  from  the  return  address  of  the  function  call  to 

the  random  number  generator. 

[0036]  FIG.  1  is  a  block  diagram  illustrating  a  random  number  generator 
that  provides  distinct  random  number  streams  to  different  program  contexts  of  a 
parallel  computation.  Parallel  computation  100  includes  processes  102.  In  some 
10  embodiments,  processes  102  each  include  SPMD-based  parallel  application  code  for 
carrying  out  parallel  computation  100.  Contexts  104  may  correspond  to  a  location  in 
the  code  of  one  of  processes  102.  Processes  102  include  contexts  104. 

[0037]  Random  number  generator  106  may  provide  random  number 
streams  to  contexts  104  in  processes  102.  Each  of  contexts  104  may  make  calls  108 
15  requesting  random  numbers.  In  response,  random  number  generator  106  may 

generate  a  random  number  stream  110  to  the  context.  In  some  embodiments,  each 
random  number  stream  110  is  generated  from,  or  retrieved  from,  one  of  library 
modules  114. 

[0038]  In  some  embodiments,  a  distinct  stream  is  provided  to  each  random 
20  number  context.  For  example,  the  random  number  stream  provided  to  context  A  of 

process  1  may  be  distinct  from  the  random  number  streams  provided  to  context  B  of 
process  1,  which  may  be  different  from  the  random  number  stream  provided  to 
context  C  of  process  1,  and  so  on. 

[0039]  Each  of  processes  102  may  include  multiple  iterations  112.  Each 
25  of  iterations  112  may  be  associated  with  an  iteration  number.  For  each  of  iterations 

112  of  processes  102,  context  104  may  separately  call  for  a  random  number  stream. 
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[0040]  In  some  embodiments,  random  number  context  (RN-context)  is 
used  with  other  information  to  determine  an  RN  stream  to  be  used  for  a  computation. 
The  RN  context  may  be  derived  from  the  return  address  of  the  function  call  to  the 
random  number  generator,  a  process  number  or  thread  number,  an  iteration  number  (if 
5  appropriate),  any  user  supplied  stream  identifier,  or  a  combination  of  one  or  more  of 

these  elements.  A  user  supplied  stream  identifier  may  be,  for  example,  an  index  to 
RN  stream  contexts  or  a  pointer  to  a  data  structure  containing  the  RN  stream  context. 

[0041]  FIG.  2  is  a  block  diagram  illustrating  a  random  number  generator 
that  can  provide  distinct  random  number  streams  to  different  program  contexts  and 
10  different  processes  of  a  parallel  computation  based  on  program  context  and  other 

information.  An  application’s  request  for  a  random  number  may  provide  user- 
specified  stream  ID  120  to  library  module  114.  A  process  ID  122  may  be  associated 
with  each  of  processes  102.  An  iteration  number  124  may  be  associated  with  each 
iteration  of  a  process.  User- specified  stream  ID  120,  process  ID  122,  and  iteration 
15  number  124  may  be  accessed  by  random  number  generator  106.  In  some 

embodiments,  random  number  generator  106  uses  one  or  more  of  user-specified 
stream  ID  120,  process  ID  122,  and  iteration  number  124,  in  combination  context 
information  associated  with  one  of  contexts  104,  to  determine  the  random  number 
stream  to  be  used  to  provide  one  or  more  random  numbers  to  the  context.  The  random 
20  number  stream  may  be  initialized  if  it  is  not  already  initialized,  as  in  the  case  of  the 

first  call  to  this  stream. 

[0042]  Each  of  processes  102  may  have  unique  process  ID  122.  Random 
number  generator  106  may  provide  a  distinct  stream  to  each  program  context  and 
process.  Thus,  for  example,  random  number  stream  115  supplied  to  Context  A  of 
25  process  2  in  response  to  call  113  may  be  distinct  from  random  number  stream  110 

supplied  to  Context  A  of  process  1  in  response  to  call  108. 

[0043]  In  one  embodiment,  context-aware  parallel  pseudorandom  number 
generators  are  implemented  as  library  modules  that  can  be  linked  to  application  codes 
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at  the  compile  time.  Random  numbers  may  be  retrieved  from  the  CPRNG  library 
using  function  calls  at  the  run  time. 

[0044]  FIG.  3  illustrates  providing  random  number  streams  to  a  process 
based  on  a  determined  program  context.  At  200,  a  program  context  is  determined  for 
5  program  contexts  within  a  process.  Each  of  the  program  contexts  may  include  code 

that  calls  for  one  or  more  random  numbers.  For  example,  referring  to  FIG.  1,  process 
1  includes  program  context  A,  Context  B,  and  Context  C. 

[0045]  At  202,  a  random  number  stream  is  provided  for  each  of  the 
program  contexts  based  on  the  determined  program  context.  For  example,  referring 
10  to  FIG.  1,  random  number  generator  106  may  provide  a  distinct  random  number 
stream  to  each  of  Context  A,  Context  B,  and  Context  C  in  process  1.  For  example, 
random  number  stream  111  provided  to  Context  B  in  response  to  call  109  may  be 
distinct  from  random  number  stream  110  provided  to  Context  A  in  response  to  call 
108. 


15  [0046]  In  some  embodiments,  random  number  streams  are  generated  for 

two  or  more  processes  in  a  parallel  computation.  The  random  numbers  streams  may 
be  provided  such  that  the  random  number  streams  used  by  one  process  are  distinct 
from  those  of  other  processes.  In  certain  embodiments,  streams  are  generated  such 
that  the  corresponding  contexts  of  different  parallel  processes  are  provided  with 
20  distinct  random  number  streams.  For  example,  random  number  generator  106  may 
provide  a  random  number  stream  to  context  A  of  process  1  that  is  distinct  from  the 
random  number  stream  provided  to  context  A  of  process  2. 

[0047]  In  some  embodiments,  a  parameterized  pseudorandom  number 
generator  (RNG)  is  used  to  generate  a  large  number  of  random  number  (RN)  streams. 
25  The  RNG  may  be  augmented  with  a  scalable  and  automatic  initialization  process. 

Parameterized  PRNGs  that  may  be  used  in  some  embodiments  of  a  context-aware 
random  number  generator  include  an  additive  lagged  Fibonacci  generator  (AFFG)  or 
a  multiplicative  lagged  Fibonacci  generator  (MFFG). 
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[0048]  An  additive  lagged  Fibonacci  generator  (ALFG)  uses  an  addition- 
based  recursion: 

[0049]  xn  =  xn_k  +  xn_,  (mod 2"' ),  0 <k  <1  <n, 

[0050]  where  /  and  k  are  the  lags  (or  indices  to  the  older  numbers  used  to 
5  generate  the  new  number),  n,l,k  are  positive  integers,  and  v;-'s  are  m  -bit  random 

numbers.  The  values  /  =  17  and  k-  5  are  commonly  used  to  generate  multiple 
distinct  streams  of  32-  or  64-bit  RNs.  However,  to  pass  very  stringent  intra-stream 
correlations  tests,  the  lag,  l ,  needs  to  be  very  high,  over  1000. 

[0051]  A  drawback  of  ALFG  may  be  the  initialization  cost  of  l  words 
10  before  generating  any  RNs  that  can  be  used  by  the  application  code. 

[0052]  An  advantage  of  ALFG  may  be  that  it  has  a  large  number  of 
independent  and  long  cycles  of  RNs.  For  a  b-bit,  r  lagged  ALFG,  there  are  2(/’~1)(/~1> 
cycles,  each  of  length  ( 2 1  -1)26-1 . 

[0053]  A  multiplicative  lagged  Fibonacci  generator  (MLFG)  is  similar  to 
15  ALFG  except  that  multiplication  instead  of  addition  is  used  in  the  recursion.  MLFG 

has  only  one-fourth  as  many  cycles,  and  each  of  only  one-fourth  as  long  as  those  in 
ALFG.  MLFGs  may  be  suitable  in  many  embodiments  of  a  CPRNG,  since  even  with 
a  small  lag  of  17,  it  may  be  feasible  to  generate  RN  streams  that  pass  many  of  the 
stringent  tests. 

20  [0054]  The  multiplicative  Fibonacci  lagged  generator  (MLFG)  uses  the 

recurrence  relation 

[0055]  xn  =xn_k  xxn_t  (mod2m),  0  <k<l<n, 

[0056]  Where  m  is  the  random  integer  size  in  bits,  l  and  k  are  the  lags 
or  offsets  to  the  stream  of  previously  generated  random  numbers,  and  xt,  i  >  /,  are 
25  the  random  numbers  generated.  RNs  xt,x2,...,x,  form  the  initialization  (seed) 
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sequence  or  state  and  the  initial  words  of  a  RN  stream.  The  state  of  RN  stream  is 
always  given  by  its  most  recent  /  words.  Theoretical  results  show  that  each  distinct 
combination  of  certain  (/  -  3)  x  (m  - 1)  of  the  l  x  m  bits  in  the  seed  gives  a  distinct 

RN  stream  for  a  total  of  2(,-3Xm-1)  streams,  each  with  a  cycle  of 

5  2(,“3,  x (2”'  - 1)  ~  2"'+/~3  RNs.  Therefore,  there  are  (in  —  3)x (/  —  l)  bits  that  may  need  to 

be  determined  uniquely  for  each  RN  stream  initialization  (seed)  sequence. 

[0057]  A  64-bit  MFLG  with  lag  17  may  be  implemented  in  one  example. 
With  64-bit  integers  and  a  lag  of  17,  there  are  261x16  =  2  976  ~6xl0293  different  RN 
streams,  each  with  distinct  976 -bit  seed  value  and  a  cycle  length  of 
10  261  (217  -1)  ~  278  ~  3xl023.  A  few  of  the  lower  bits  of  Xj's  may  be  discarded  and 

remaining  bits  of  xt’s  are  used  to  supply  the  RNs  to  improve  the  randomness  since 

the  lower  bits  are  often  less  random  owing  to  the  arithmetic  operation  involved.  The 
random  numbers  may  be  provided  as  integers  or  as  real  numbers  in  the  range  [0,1)  by 
computing  the  fractions  resulting  from  the  division  of  the  integer  xt  ’s  with 
15  l  +  max_r/7,  where  max_m  is  the  maximum  value  an  xt  may  take.  In  one 

embodiment,  a  PRNG  package  called  SPRNG  and  the  MLFG  available  from  its 
library  are  used  to  implement  a  CPRNG. 

[0058]  In  one  implementation  of  context-aware  random  number 
generation,  a  SPRNG  library  package  provides  init_mg()  and  get_m_dbl()  function 
20  calls  to  initialize  a  new  RN  stream  and  to  obtain  the  next  RN  in  an  already  initialized 
stream,  respectively.  The  init_rng  function  is  called  by  specifying  the  seed, 
parameters  set  that  specify  the  lags  and  the  locations  of  the  odd  numbered  words  in 
the  initial  set  of  lag  words,  maximum  number  of  RN  streams  (denoted  max_str)  that 
will  be  requested  by  the  application,  and  cur_str,  the  RN  stream  number  in  the  range 
25  0,l,...,max_str-l  that  needs  to  be  initialized.  The  seed,  parameter  set,  and  max_str 

may  be  common  in  all  init_rng()  calls.  Each  call  to  init_mg  function  returns  a  pointer 
to  one  RN  stream. 
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[0059]  In  one  embodiment  of  a  CPRNG  implementation,  each  init_mg() 
call  allocates  not  just  one  RN  stream  but  a  set  of  distinct  RN  streams  and  returns  a 
pointer,  str_ptr,  to  the  set;  the  streams  in  this  set  can  be  customized  with  program 
context  without  further  calls  to  init_rng().  The  RN-context,  the  context  or  the  program 
5  location  from  which  a  RN  number  is  requested,  is  used  in  addition  to  the  stream-set 

pointer,  str_ptr,  to  determine  the  specific  RN  stream  to  be  used.  The  RN  context  may 
be  derived  from  a  combination  of  the  program  line  number  in  the  source  code,  the 
return  address  of  the  function  call  to  get_rn_dbl(),  the  process/thread  numbers,  and 
any  user  supplied  identifiers  such  as  the  iteration  number.  When  the  application 
10  requests  for  a  random  number  using  the  function  call  get_rn_dbl(str_ptr),  the  RN- 

context  is  used  to  determine  the  specific  RN  stream  to  be  used  in  the  set  of  streams 
pointed  by  str_ptr.  The  appropriate  RN  stream  may  be  automatically  initialized  with 
the  RN-context,  if  it  is  the  first  call  from  this  context,  and  a  RN  from  the  stream  is 
returned. 

15  [0060]  Each  call  to  init_mg()  may  result  in  the  initialization  of  the  RN 

stream  specified  by  the  stream  number,  cur_str,  and  the  calling  code  is  given  a  pointer 
to  the  RN  stream  that  should  be  used  as  argument  in  the  function  call  get_rn_dbl()  to 
obtain  the  next  RN  in  the  stream. 

[0061]  In  this  example  embodiment,  CPRNG  differs  from  the  MLFG  in 
20  the  SPRNG  package  in  several  ways:  (a)  automatically  generating  distinct  RN  streams 
based  on  program  context  for  the  same  str_ptr  value;  (b)  initialization  method  used  to 
seed  RN  streams  to  improve  the  randomness  and  also  to  ensure  that  RN  context  can 
be  added  to  dynamically  create  distinct  RN  streams  without  requiring  additional 
init_rng()  calls;  (c)  the  distinct  ID  field  that  allocates  distinct  values  for  a  portion  of 
25  the  seed  sequence  statically  (when  the  cur_str  value  is  less  than  max_str  value  in  the 

function  call  init_rng())  and  additional  seed  sequences  dynamically  beyond  the 
max_str  limit  in  case  the  application  requires  more  RN  streams  than  originally 
estimated.  Extensive  statistical  tests  are  used  to  show  that  CPRNG  implementation  of 
MLFG  generates  billions  of  RN  streams  with  low  interstream  correlations  while  the 
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implementation  of  the  same  theoretical  generator  in  SPRNG  exhibits  statistically 
significant  correlations  for  more  than  a  million  streams.  The  specification  of  max_str 
limits  the  maximum  number  of  cur_str  values  that  can  be  used  to  call  init_rng()  in 
SPRNG  implementation,  whereas  max_str  is  a  threshold  to  determine  whether  the 
5  initialization  sequences  are  allocated  statically  or  dynamically.  Static  allocation  of  the 

seed  sequences  improves  repeatability  of  the  computations  when  rerun  with  the  same 
input  data  and  dynamic  allocation  of  seed  sequences  relieves  the  burden  of  specifying 
the  maximum  number  of  stream  allocations  needed  a  priori.  Context-awareness 
provides  distinct  RN  streams  for  distinct  program  contexts  even  when  str_ptr  used  in 
10  the  calls  to  get_rn_dbl()  is  the  same.  In  SPRNG  implementation,  the  application 
needs  to  be  coded  explicitly  to  use  different  str_ptr  in  calling  get_rn_dbl()  to  achieve 
the  same  functionality.  In  this  example  embodiment,  CPRNG  may  avoid  such 
application  coding  and  automate  the  management  of  distinct  streams  for  distinct 
contexts. 

15  [0062]  FIG.  4  illustrates  one  embodiment  of  the  initialization  process  by 

CPRNG.  In  this  example  shown  in  FIG.  4,  the  initialization  may  be  based  on  lag 
parameters  l  and  k  ,  0 <  A'  <  / -  1 .  A  call  to  init_rng()  results  initialization  of  1-3  of 
the  lag  words  using  a  sequential  RNG  such  as  the  recursion  with  carry  (RWC) 
generator,  a  32-bit  generator,  initialized  with  the  user  specified  seed  integer.  In  this 
20  example,  these  lag  words  are  common  to  the  initialization  of  all  RN  streams 
regardless  of  the  process  number  or  RN-context.  One  of  the  remaining  three  lag 
words  is  filled  with  an  ID  that  is  guaranteed  to  be  distinct  for  distinct  cur_str  numbers 
specified  in  init_mg().  The  distinct  ID  word  is  common  to  the  set  of  RN  streams  that 
are  allocated  based  on  different  RN  contexts  but  have  the  same  cur_str  number.  The 
25  remaining  two  lag  words  are  filled  with  the  RN-context  so  that  distinct  RN-contexts 

result  in  distinct  RN  streams. 

[0063]  In  the  embodiment  shown  in  FIG.  4,  initialization  of  RN  stream 
state  by  CPRNG.  In  this  example,  the  state  consists  of  l  lag  words.  Each  lag  word  is 
a  32-bit  or,  more  typically,  64-bit  word  with  maximum  lag  / ,  1-3  of  the  lag  words  is 
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filled  randomly  based  on  the  user  specified  seed  and  a  sequential  RNG.  In  this 
example,  these  words  are  common  to  all  RN  streams  used  during  the  execution  of  the 
application.  Lag  k ,  k  <  l  - 1 ,  is  initialized  with  a  unique  and  distinct  ID  that  is 
associated  with  the  cur_str  used  in  the  init_mg()  call.  Lags  k  + 1  and  k  +  2  are 
5  initialized  with  RN-context  to  create  a  distinct  RN  stream  for  each  distinct  program 

context  in  each  process. 

[0064]  For  MLFG,  all  the  lag  words  are  odd  values.  Therefore,  for  each 
lag  word,  only  (m  -  3)  of  each  lag  word  in  an  m  -bit  MLFG  are  determined  uniquely, 
and  a  least  significant  bit  determined  by  the  canonical  form  and  parameter  set  is 
10  appended  to  it  to  form  an  (m-2)-bit  number,  say,  z  ■  The  actual  lag  word  may  be 

formed  by  using  the  operation  (- 1 ) }  3 ’  mod  2'",  where  y  is  a  randomly  generated  1 
or  0.  Henceforth,  the  discussion  of  a  lag  word  initialization  pertains  to  the  generation 
of  the  (m  — 3)  bits  since  every  initial  lag  word  will  be  transformed  using  the  operation 
(- 1 ) }  3 ~  mod2m  .  For  a  64-bit  MLFG,  two  consecutive  32-bit  RNs  generated  by  the 
15  RWC  generator  may  be  used  form  a  61 -bit  integer  for  the  lag  words  filled  by  it. 

Similarly,  only  61  bits  of  each  of  the  lag  words  used  for  distinct  ID  word  and  the  RN 
context  words  need  to  be  determined  uniquely. 

[0065]  In  some  embodiments,  the  number  of  bits  used  for  distinct  ID  may 
be  more  or  fewer  than  m  -  3  bits,  and  more  than  one  lag  word  or  only  a  portion  of  a 
20  lag  word  may  be  used.  Up  to  1-2  lag  words  are  available  for  distinct  ID 
specification.  Similarly,  the  number  of  bits  used  RN  context  may  be  more  or  fewer 
than  2(m-3)  bits  used  in  the  example  embodiment  in  FIG.  4.  Furthermore,  the 
positions  of  distinct  ID  bits  and  RN  context  bits  can  be  anywhere  in  the 
(m-3)x(/-l)  bits  available  to  seed  distinct  RN  streams.  Any  bits  not  used  for 
25  distinct  ID  and  RN  context  fields  will  be  randomly  filled  with  the  RWC  or  some  other 

good  sequential  random  number  generator  initialized  with  user  supplied  32-bit  or  64- 
bit  single-seed  value 
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[0066]  For  a  CPRNG  based  on  MLFG  with  maximum  lag  /  =  17  and  64- 

bit  words,  2~x61  =2 122  distinct  RN  streams  may  be  allocated  with  each  init_rng()  call. 
Based  on  the  context  and  str_ptr  argument  used  in  a  call  to  get_m_dbl(),  an 
appropriate  stream  is  selected,  automatically  initialized  prior  to  first  use,  and  the  next 
5  RN  in  the  stream  is  returned.  CPRNG  may  be  used  without  RN-contexts  by  choosing 

appropriate  parameters  to  init_rng()  call.  If  RN-contexts  are  not  used,  then  the  two  lag 
words  that  are  normally  filled  with  RN-context  are  filled  with  the  random  bits 
generated  by  the  sequential  RWC  generator.  The  lag  word  with  distinct  ID  may  be 
used  to  ensure  that  RN  streams  are  distinct  for  distinct  values  of  cur_str  specified  in 
10  the  init_mg().  CPRNG  may  be  simply  a  basic  MLFG  when  used  without  context. 

[0067]  For  applications  that  use  a  large  and  variable  number  of  RN 
streams,  having  to  specify  the  maximum  number  of  streams  used  during  an  execution 
run  is  a  limitation.  Furthermore,  certain  large-scale  parallel  applications  may  spawn 
additional  processes  and  threads  dynamically  depending  on  the  input  data  and 
15  intermediate  results.  To  accommodate  such  situations,  CPRNG  may  assign  several 

(210  in  the  example  embodiment)  consecutive  distinct  IDs  for  the  lag  word  k  upon  a 
call  to  init_rng(),  independent  of  any  streams  allocated  to  handle  RN  contexts. 
Therefore,  CPRNG  may  allocate  multiple  initialization  (seed)  sequences,  which  can 
be  used  to  initialize  distinct  RN  streams  by  simply  initializing  the  distinct  ID  lag  word 
20  based  on  the  unused  distinct  IDs  allocated  and  keeping  the  other  initialization  words 

the  same,  to  the  calling  process.  Typically,  only  one  of  these  IDs  is  used  by  a  process. 
However,  if  a  process  spawns  threads  or  child  processes  and  needs  to  use  additional 
distinct  RN  streams  without  going  through  the  initialization  process,  it  can  have  them 
without  any  communication  overhead  by  using  the  original  initialization  with  the 
25  distinct  ID  word  replaced  with  one  of  the  unused  IDs  from  its  allocated  IDs.  This 

leads  to  faster  initialization  of  the  new  RN  streams  on  demand.  If  more  RN  streams 
are  needed  and  init_mg()  is  called  with  cur_str  value  greater  than  max_str,  a 
monotonically  increasing  counter  is  used  to  ensure  that  the  lag  word  K  is  distinct. 
However,  the  access  to  this  counter  may  need  to  be  serialized  by  using  appropriate 
30  mutex  locks  in  threaded  applications  or  by  assigning  it  to  a  process  to  serve  the 
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counter  values  to  the  other  processes  of  the  application.  In  these  instances,  an 
additional  communication  or  serialization  overhead  may  be  incurred  by  CPRNG 
compared  to  the  static  methods  used  in  some  packages.  On  the  other  hand,  CPRNG 
provides  virtually  unlimited  number  of  RN  streams  on  demand,  limited  only  by  the 
5  number  of  bits  used  for  the  distinct  ID,  and  avoids  depletion  of  the  available  RN 

streams  that  can  occur  with  static  partitioning  of  the  available  RN  streams  for 
applications  with  many  levels  of  dynamic  process/thread  creation. 

[0068]  In  some  existing  parallel  random  number  generators  (PRNG),  only 
the  user  supplied  stream  identifier  is  used  to  determine  the  RN  stream,  thus  leaving 
10  the  burden  of  managing  multiple  RN  streams  to  the  user.  This  can  be  onerous, 
especially  if  the  application  is  iterative  and  RNs  are  consumed  at  multiple  locations  in 
each  iteration.  Use  of  a  CPRNG  may  relieve  a  user  from  managing  multiple  streams 
for  each  thread  or  process.  In  some  embodiments,  the  use  of  process/thread  numbers 
may  be  used  in  addition  to  context  information.  The  option  of  using  process/thread 
15  number  to  determine  RN  contexts  may  be  selected  by  a  user  at  a  compile-time  or 

runtime.  Use  of  a  process/thread  number  in  determining  the  RN  context  may  reduce 
reproducibility  of  results. 

[0069]  In  some  embodiments,  once  a  unique  RN-context  is  determined, 
RN-context  information  may  be  embedded  into  a  seed  sequence  to  initialize  an  RN 
20  stream.  The  seed  sequence  may  be,  for  example,  a  976 -bit  sequence  for  a  64-bit 
MLFG  with  lag  17.  In  some  cases,  it  may  be  sufficient  to  limit  the  RN-context  size  to, 
for  example,  two  lag  words  (122  bits;  only  61  bits  of  each  64-bit  lag  word  are 
determined,  and  the  remaining  three  bits  are  determined  by  a  canonical  form  used  to 
initialize  the  lag  words).  The  RN-context  may  be  concatenated  with  an  additional 
25  deterministically  generated  distinct  ID  (one  lag  word  or  61  bits)  to  further  distinguish 

the  initialization  of  RN  streams.  The  remaining  bits  may  be  filled  randomly  using  a 
good  sequential  RNG,  such  as  a  recursion  with  carry  (RWC)  generator  using  a  user- 
supplied  seed  integer.  These  random  bits  may  be  common  to  the  initialization  of  all 
RN  streams. 
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[0070]  In  some  embodiments,  a  CPRNG  implements  a  scalable 
initialization  of  RN  streams.  In  one  embodiment,  the  CPRNG  initializes  RN  streams 
using  a  return  address,  any  user  supplied  identifier,  seed  information,  and  additional 
information  that  is  generated  by  a  CPRNG  library.  This  additional  information  may 
5  be  generated  in  different  ways  depending,  for  example,  on  the  application  code. 

[0071]  If  the  application  is  an  MPI-based  parallel  program  using  single¬ 
program  multiple  data  (SPMD)  program  model,  then  a  special  CPRNG  module  may 
be  associated  with  process  0.  The  user  may  be  neither  aware  of  this  nor  expected  to 
modify  the  application  code.  This  CPRNG  module  may  allocate  several,  for  example, 
10  210,  consecutive  distinct  64-bit  IDs  in  response  to  each  initialization  request.  Each 

RN  context  may  be  augmented  with  one  of  the  distinct  IDs. 

[0072]  Some  MPI  processes  dynamically  spawn  processes/threads  that  use 
RN  streams.  In  some  embodiments,  a  process  supplies  its  unused  IDs  to  its  child 
processes  to  automatically  ensure  that  RN  streams  are  distinct.  If  a  process  runs  out 
15  of  its  allocated  distinct  IDs,  then  the  CPRNG  module  may  allocates  additional  distinct 

IDs.  (In  such  instances,  an  additional  communication  overhead  may  be  incurred  by 
CPRNG  compared  to  the  static  methods  used  in  the  some  packages.)  Such  an 
approach  may  require  very  low  communication  among  the  processes  for  RN  stream 
initialization. 

20  [0073]  For  parametric  studies  based  on  Monte  Carlo  simulations,  the  RN 

streams  used  for  each  instance  of  simulation  can  be  ensured  to  be  distinct  by 
specifying  the  specific  IDs  (fore  example,  64-bit  IDs)  to  be  used  as  additional  input 
file  that  will  be  used  by  the  CPRNG  library.  A  script  (such  as  a  Python  script)  may 
partition  ID  space  and  generate  the  additional  input  files. 

25  [0074]  In  SPRNG  and  other  works,  the  initialization  for  an  RN  stream 

may  be  determined  based  on  a  user-supplied  stream  identifier  and  a  seed  integer.  The 
seed  integer  may  be,  for  example,  a  32-bit  or  a  64-bit  integer.  To  handle  the  issue  of 
new  RN  streams  for  additional  processes/threads  spawned  dynamically,  the  RN 
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stream  initialization  space  may  be  partitioned  statically  using  a  binary  partitioning 
scheme  to  ensure  initialization  without  any  communication  among  processes.  This 
can  result  in  depletion  of  the  initialization  sequences  quickly  for  applications  with 
many  levels  of  dynamic  process/thread  creation. 

5  [0075]  Although  certain  of  the  embodiments  described  above  relate  to 

simulations,  systems  and  methods  described  herein  may  be  used  in  a  variety  of 
applications.  Examples  of  applications  systems  and  methods  described  herein 
include  (a)  simulation-based  solutions  to  large  scientific  and  engineering  problems, 
(b)  parameterized  Monte  Carlo  simulations  of  scientific,  engineering,  and  finance 
10  problems,  (c)  distributed  computing,  and  (d)  protocols  and  keys  used  for  information 

assurance  and  security. 

[0076]  Systems  and  methods  described  herein  may  be  implemented  in 
hardware  including  field  programmable  gate  arrays  (FPGAs)  and  application  specific 
integrated  circuit  (ASIC)  chips,  or  a  suitable  combination  of  hardware  and  software 
15  and  which  can  be  one  or  more  software  systems  on  a  general  purpose  processor 

(CPU)  or  graphics  processing  unit  (GPU). 

[0077]  Computer  systems  may,  in  various  embodiments,  include 
components  such  as  a  CPU  with  an  associated  memory  medium  such  as  Compact 
Disc  Read-Only  Memory  (CD-ROM).  The  memory  medium  may  store  program 
20  instructions  for  computer  programs.  The  program  instructions  may  be  executable  by 
the  CPU.  Computer  systems  may  further  include  a  display  device  such  as  monitor,  an 
alphanumeric  input  device  such  as  keyboard,  a  directional  input  device  such  as 
mouse,  a  voice  recognition  system  to  dictate  text  and  issue  commands  for  processing, 
and  a  touch  screen  that  may  serve  as  a  keyboard  or  mouse.  Computer  systems  may  be 
25  operable  to  execute  the  computer  programs  to  implement  computer-implemented 

systems  and  methods.  A  computer  system  may  allow  access  to  users  by  way  of  any 
browser  or  operating  system. 
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[0078]  Embodiments  of  a  subset  or  all  (and  portions  or  all)  of  CPRNG 
may  be  implemented  and  executed  in  a  computer  and  the  random  number  streams  and 
random  numbers  so  generated  are  accessed  via  computer  network  by  at  least  one  other 
computer  executing  the  application  requesting  random  numbers. 

5  [007  9]  Embodiments  of  a  subset  of  all  (and  portions  or  all)  of  code  and 

data  needed  for  CPRNG  operation — initialize  and  maintain  random  number  streams 
and  provide  random  numbers  from  these  streams — may  be  stored  on  a  remote 
computer,  which,  in  turn,  provides  the  said  instructions  and  data  via  a  computer 
network  to  at  least  one  other  computer,  which  executes  uses  the  received  instructions 
10  and  data  to  initialize  and  maintain  random  numbers  and  provide  random  numbers  for 

applications  requesting  the  same. 

[0080]  Embodiments  of  a  subset  or  all  (and  portions  or  all)  of  the  above 
may  be  implemented  by  program  instructions  stored  in  a  memory  medium  or  carrier 
medium  and  executed  by  a  processor.  A  memory  medium  may  include  any  of  various 
15  types  of  memory  devices  or  storage  devices.  The  term  “memory  medium”  is  intended 

to  include  an  installation  medium,  e.g.,  a  Compact  Disc  Read  Only  Memory  (CD- 
ROM),  floppy  disks,  or  tape  device;  a  computer  system  memory  or  random  access 
memory  such  as  Dynamic  Random  Access  Memory  (DRAM),  Double  Data  Rate 
Random  Access  Memory  (DDR  RAM),  Static  Random  Access  Memory  (SRAM), 
20  Extended  Data  Out  Random  Access  Memory  (EDO  RAM),  Rambus  Random  Access 
Memory  (RAM),  etc.;  or  a  non-volatile  memory  such  as  a  magnetic  media,  e.g.,  a 
hard  drive,  or  optical  storage.  The  memory  medium  may  comprise  other  types  of 
memory  as  well,  or  combinations  thereof.  In  addition,  the  memory  medium  may  be 
located  in  a  first  computer  in  which  the  programs  are  executed,  or  may  be  located  in  a 
25  second  different  computer  that  connects  to  the  first  computer  over  a  network,  such  as 

the  Internet.  In  the  latter  instance,  the  second  computer  may  provide  program 
instructions  to  the  first  computer  for  execution.  The  term  “memory  medium”  may 
include  two  or  more  memory  mediums  that  may  reside  in  different  locations,  e.g.,  in 
different  computers  that  are  connected  over  a  network.  In  some  embodiments,  a 
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computer  system  at  a  respective  participant  location  may  include  a  memory 
medium(s)  on  which  one  or  more  computer  programs  or  software  components 
according  to  one  embodiment  may  be  stored.  For  example,  the  memory  medium  may 
store  one  or  more  programs  that  are  executable  to  perform  the  methods  described 
5  herein.  The  memory  medium  may  also  store  operating  system  software,  as  well  as 

other  software  for  operation  of  the  computer  system. 

[0081]  The  memory  medium  may  store  a  software  program  or  programs 
operable  to  implement  embodiments  as  described  herein.  The  software  program(s) 
may  be  implemented  in  various  ways,  including,  but  not  limited  to,  procedure -based 
10  techniques,  component-based  techniques,  and/or  object-oriented  techniques,  among 

others.  For  example,  the  software  programs  may  be  implemented  using  ActiveX 
controls,  C++  objects,  as  a  library  or  standalone  programs  in  a  programming  language 
such  as  C,  C++,  Java  or  in  a  scripting  language  such  as  Bash,  Perl,  Python,  or  AWK, 
JavaBeans,  Microsoft  Foundation  Classes  (MFC),  browser-based  applications  (e.g., 
15  Java  applets),  traditional  programs,  or  other  technologies  or  methodologies,  as 
desired.  A  CPU  executing  code  and  data  from  the  memory  medium  may  include  a 
means  for  creating  and  executing  the  software  program  or  programs  according  to  the 
embodiments  described  herein. 

[0082]  Further  modifications  and  alternative  embodiments  of  various 
20  aspects  of  the  invention  may  be  apparent  to  those  skilled  in  the  art  in  view  of  this 

description.  Accordingly,  this  description  is  to  be  construed  as  illustrative  only  and  is 
for  the  purpose  of  teaching  those  skilled  in  the  art  the  general  manner  of  carrying  out 
the  invention.  It  is  to  be  understood  that  the  forms  of  the  invention  shown  and 
described  herein  are  to  be  taken  as  embodiments.  Elements  and  materials  may  be 
25  substituted  for  those  illustrated  and  described  herein,  parts  and  processes  may  be 

reversed,  and  certain  features  of  the  invention  may  be  utilized  independently,  all  as 
would  be  apparent  to  one  skilled  in  the  art  after  having  the  benefit  of  this  description 
of  the  invention.  Methods  may  be  implemented  manually,  in  software,  in  hardware,  or 
a  combination  thereof.  The  order  of  any  method  may  be  changed,  and  various 


Atty.  Dkt.  No.:  5660-14400 


Page  23 


Meyertons,  Hood,  Kivlin,  Kowert  &  Goetzel,  P.C. 


elements  may  be  added,  reordered,  combined,  omitted,  modified,  etc.  Changes  may 
be  made  in  the  elements  described  herein  without  departing  from  the  spirit  and  scope 
of  the  invention  as  described  in  the  following  claims. 
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WHAT  IS  CLAIMED  IS: 


1 .  A  method  of  providing  random  number  streams  to  a  process,  comprising: 

determining  one  or  more  program  contexts  within  a  process,  wherein  at  least 
5  one  of  the  one  or  more  program  contexts  comprises  code  that  calls  for  one 

or  more  random  numbers;  and 

providing,  for  each  of  at  least  one  of  the  program  contexts,  a  random  number 
stream  to  the  process,  wherein  the  random  number  stream  provided  for  at 

10  least  one  of  the  program  contexts  is  based  at  least  in  part  on  the 

determined  program  context,  and  wherein  the  random  number  stream 
provided  for  at  least  one  of  the  program  contexts  is  distinct  from  the 
random  number  stream  for  at  least  one  other  of  the  program  contexts. 

15  2.  The  method  of  claim  1,  wherein  each  of  the  program  contexts  is  provided  a 

random  number  stream  that  is  distinct  from  the  random  number  stream  for  any  of  the 
other  program  contexts  in  the  process. 

3.  The  method  of  claim  1,  wherein  providing  the  random  number  stream  to  the 

20  process  for  each  of  at  least  one  of  the  two  or  more  program  contexts  comprises  providing 

a  set  of  distinct  random  number  streams  in  response  to  a  call  from  one  of  the  program 
contexts. 

4.  The  method  of  claim  1,  wherein  providing  the  random  number  stream  to  the 
process  for  each  of  at  least  one  of  the  two  or  more  program  contexts  comprises 

25  initializing  the  states  of  the  random  number  streams,  wherein  the  states  are  used  to 
generate  distinct  random  number  streams  for  at  least  two  of  the  program  contexts. 

5.  The  method  of  claim  1,  wherein  the  random-number  context  is  determined 
based,  at  least  in  part,  on  the  return  address  of  a  function  call  to  obtain  a  random  number. 

30 
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6.  The  method  of  claim  1,  wherein  providing  the  random  number  stream  to  the 
process  for  each  of  at  least  one  of  the  two  or  more  program  contexts  comprises 
embedding  context  information  into  a  seed  sequence  to  initialize  the  random  number 
stream. 

5 

7.  The  method  of  claim  1,  wherein  the  process  is  one  of  two  or  more  processes  in 
a  parallel  process  computation,  wherein  the  random  number  stream  provided  for  at  least 
one  of  the  program  contexts  is  based  in  part  on  a  process  identifier  for  the  process, 
wherein  the  random  number  stream  is  distinct  from  the  random  number  stream  provided 

10  for  program  contexts  in  at  least  one  other  process  of  the  two  or  more  processes  in  the 
parallel  process  computation. 

8.  The  method  of  claim  1,  wherein  the  random  number  stream  provided  for  at 
least  one  of  the  program  contexts  is  based  in  part  on  a  user-supplied  stream  identifier  for 

15  program  context. 

9.  The  method  of  claim  1,  wherein  providing  the  random  number  stream  to  the 
process  for  each  of  at  least  one  of  the  two  or  more  program  contexts  comprises  receiving 
a  stream  identifier  stored  in,  or  generated  from,  a  library  module. 

20 

10.  The  method  of  claim  1,  wherein  the  random  number  stream  provided  for  at 
least  one  of  the  program  contexts  is  based  in  part  on  an  iteration  number. 

11.  The  method  of  claim  1,  wherein  the  random  number  stream  provided  for  at 

25  least  one  of  the  program  contexts  is  based  in  part  on  a  user-specified  seed  value. 

12.  The  method  of  claim  1,  wherein  the  process  is  a  dynamically  spawned 
process,  wherein  a  random  number  stream  allocated  to  it  is  based  in  part  on  unused 
initialization  sequences  from  the  random  number  streams  originally  allocated  to  the 

30  parent  process  from  which  the  process  was  spawned. 


Atty.  Dkt.  No.:  5660-14400 


Page  26 


Meyertons,  Hood,  Kivlin,  Kowert  &  Goetzel,  P.C. 


13.  A  system,  comprising: 


a  processor; 

5  a  memory  coupled  to  the  processor,  wherein  the  memory  comprises  program 

instructions  executable  by  the  processor  to  implement: 

determining  one  or  more  program  contexts  within  a  process,  wherein  at  least 
one  of  the  one  or  more  program  contexts  comprises  code  that  calls  for  one 
10  or  more  random  numbers;  and 

providing,  for  each  of  at  least  one  of  the  program  contexts,  a  random  number 
stream  to  the  process,  wherein  the  random  number  stream  provided  for  at 
least  one  of  the  program  contexts  is  based  at  least  in  part  on  the 
15  determined  program  context,  and  wherein  the  random  number  stream 

provided  for  at  least  one  of  the  program  contexts  is  distinct  from  the 
random  number  stream  for  at  least  one  other  of  the  program  contexts. 

14.  The  system  of  claim  13,  further  comprising: 

20  a  network  of  systems  in  which  one  or  more  systems  may  store  portions  or 

all  of  code  and  data  needed  for  CPRNG  and  compute  or  provide 
instructions  or  data  needed  to  use  CPRNG  or  the  random  numbers  to  at 
least  one  or  more  other  systems  by  way  of  the  computer  network. 

25  15.  The  system  of  claim  13,  wherein  each  of  the  program  contexts  is  provided  a 

random  number  stream  that  is  distinct  from  the  random  number  stream  for  any  of  the 
other  program  contexts  in  the  process. 
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16.  A  non-transitory,  computer-readable  storage  medium  comprising  program 
instructions  stored  thereon,  wherein  the  program  instructions  are  configured  to 
implement: 

determining  one  or  more  program  contexts  within  a  process,  wherein  at  least 
5  one  of  the  one  or  more  program  contexts  comprises  code  that  calls  for  one 

or  more  random  numbers;  and 

providing,  for  each  of  at  least  one  of  the  program  contexts,  a  random  number 
stream  to  the  process,  wherein  the  random  number  stream  provided  for  at 

10  least  one  of  the  program  contexts  is  based  at  least  in  part  on  the 

determined  program  context,  and  wherein  the  random  number  stream 
provided  for  at  least  one  of  the  program  contexts  is  distinct  from  the 
random  number  stream  for  at  least  one  other  of  the  program  contexts. 

15  17.  The  computer-readable  storage  medium  of  claim  16,  wherein  the  program 

instructions  further  comprise: 

CPRNG  code  and  data  in  the  storage  medium  of  one  computer  accessed  by 
way  of  a  computer  network  by  another  computer  to  initialize  and  maintain 
random  number  streams  and  generate  random  numbers. 

20 

18.  The  computer-readable  storage  medium  of  claim  16,  wherein  each  of  the 
program  contexts  is  provided  a  random  number  stream  that  is  distinct  from  the  random 
number  stream  for  any  of  the  other  program  contexts  in  the  process. 

25  19.  A  method  of  providing  random  numbers  streams  to  processes  performing  a 

parallel  computation,  comprising: 

determining  one  or  more  program  contexts  within  one  process  of  a  parallel 
computation,  wherein  the  parallel  computation  includes  two  or  more 
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processes  performed  in  parallel,  wherein  each  of  the  one  or  more  program 
contexts  comprises  code  that  calls  for  one  or  more  random  numbers;  and 

providing  a  random  number  stream  to  the  one  process  for  each  of  at  least  one 
5  of  the  one  or  more  program  contexts,  wherein  the  random  number  stream 

provided  is  based  in  part  on  the  determined  program  context  and  based  in 
part  on  which  of  the  two  or  more  processes  the  program  context  is  in. 

20.  The  method  of  claim  19,  wherein  each  of  the  program  contexts  is  provided  a 
10  random  number  stream  that  is  distinct  from  the  random  number  stream  for  any  of  the 

other  program  contexts  in  the  process. 

21.  The  method  of  claim  19,  further  comprising: 

determining  one  or  more  program  contexts  within  a  second  process  of  the  parallel 
15  computation;  and 

providing  a  random  number  stream  to  the  second  process  for  each  of  at  least  one 
of  the  one  or  more  program  contexts, 

20  wherein  the  random  number  stream  to  the  second  process  is  determined  based  in 

part  on  the  determined  program  context  and  based  in  part  on  which  of  the  two 
or  more  processes  the  program  context  is  in, 

wherein  the  random  number  stream  provided  for  a  program  context  is  distinct 
25  from  the  random  number  stream  provided  for  the  program  contexts  in  at  least 

one  other  process  of  the  two  or  more  processes  in  the  parallel  computation. 

22.  The  method  of  claim  19,  wherein  the  random  number  stream  is  distinct  from 
the  random  number  stream  provided  for  a  corresponding  program  context  in  at  least  one 

30  other  process  of  the  two  or  more  processes  in  the  parallel  computation. 
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23.  The  method  of  claim  19,  wherein  the  random  number  stream  provided  for  at 
least  one  of  the  program  contexts  is  based  in  part  on  a  process  identifier  for  the  process. 

5  24.  The  method  of  claim  19,  wherein  the  random  number  stream  is  distinct  from 

the  random  number  stream  provided  for  program  contexts  in  at  least  one  other  process  of 
the  two  or  more  processes  in  the  parallel  computation. 

25.  The  method  of  claim  19,  wherein  providing  a  random  number  stream  to  the 
10  one  process  for  each  of  at  least  one  of  the  two  or  more  program  contexts  comprises 

providing  a  random  number  stream  for  each  of  at  least  two  of  the  two  or  more  program 
contexts, 

wherein  the  random  number  stream  provided  for  the  program  contexts  is  based  at 
least  in  part  on  the  determined  program  context,  and 
15  wherein  the  random  number  stream  provided  for  at  least  one  of  the  program 

contexts  is  distinct  from  the  random  number  stream  for  at  least  one  other  of  the  program 
contexts. 

26.  The  method  of  claim  19,  wherein  the  random-number  context  is  based,  at 
20  least  in  part,  on  the  return  address  of  a  function  call  to  obtain  a  random  number. 

27.  A  method  of  providing  random  numbers  streams  to  processes  performing  a 
parallel  computation,  comprising: 

25  receiving  a  call  for  one  or  more  random  numbers  from  a  program  context  in  a 

process  one  process  of  a  parallel  computation,  wherein  one  process  is  one 
of  two  or  more  processes  performed  in  a  parallel  computation;  and 
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providing  a  random  number  stream  to  the  one  process  for  the  program 

contexts,  wherein  the  random  number  stream  provided  is  based  at  least  in 
part  on  the  determined  program  context. 

5  28.  The  method  of  claim  28,  wherein  based  in  part  on  which  of  the  two  or  more 

processes  the  program  context  is  in. 

29.  The  method  of  claim  28,  wherein  the  random-number  context  is  based,  at 
least  in  part,  on  the  return  address  of  a  function  call  to  obtain  a  random  number. 

10 
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Amendments  to  the  Claims 


This  listing  of  claims  will  replace  all  prior  versions,  and  listings,  of  claims  in  the  above- 
captioned  application: 

1 .  (Currently  Amended):  A  method  of  dynamically  providing  random  number  streams  to  a 
process,  comprising: 

determining,  by  a  processing  device,  a  plurality  of  program  contexts  within  the  process, 
wherein  each  program  context  comprises  calls  for  one  or  more  random  numbers; 
and 

providing  automatically,  for  each  program  context,  a  distinct  random  number  stream, 
wherein  the  random  number  stream  provided  for  one  of  the  program  contexts  is 
based  at  least  in  part  on  the  detennined  program  context,  and  wherein  the  random 
number  stream  provided  for  one  of  the  program  contexts  is  distinct  from  the 
random  number  stream  for  at  least  one  other  of  the  program  contexts. 

2.  (Original):  The  method  of  claim  1,  wherein  each  of  the  program  contexts  is  provided  a 
random  number  stream  that  is  distinct  from  the  random  number  stream  for  any  of  the  other 
program  contexts  in  the  process. 

3.  (Original):  The  method  of  claim  1,  wherein  providing  the  random  number  stream  to  the 
process  for  each  of  at  least  one  of  the  two  or  more  program  contexts  comprises  providing  a  set  of 
distinct  random  number  streams  in  response  to  a  call  from  one  of  the  program  contexts. 

4.  (Original):  The  method  of  claim  1 ,  wherein  providing  the  random  number  stream  to  the 
process  for  each  of  at  least  one  of  the  two  or  more  program  contexts  comprises  initializing  the 
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states  of  the  random  number  streams,  wherein  the  states  are  used  to  generate  distinct  random 
number  streams  for  at  least  two  of  the  program  contexts. 

5.  (Currently  amended):  The  method  of  claim  1,  wherein  one  or  more  of  the  program  contexts 
includes  one  or  more  random-number  contexts}  and  each  of  the  random  number  contexts  is 
determined  based,  at  least  in  part,  on  a  return  address  of  a  function  call  to  obtain  a  random 
number. 

6.  (Original):  The  method  of  claim  1 ,  wherein  providing  the  random  number  stream  to  the 
process  for  each  of  at  least  one  of  the  two  or  more  program  contexts  comprises  embedding 
context  information  into  a  seed  sequence  to  initialize  the  random  number  stream. 

7.  (Original):  The  method  of  claim  1,  wherein  the  process  is  one  of  two  or  more  processes  in  a 
parallel  process  computation,  wherein  the  random  number  stream  provided  for  at  least  one  of  the 
program  contexts  is  based  in  part  on  a  process  identifier  for  the  process,  wherein  the  random 
number  stream  is  distinct  from  the  random  number  stream  provided  for  program  contexts  in  at 
least  one  other  process  of  the  two  or  more  processes  in  the  parallel  process  computation. 

8.  (Original):  The  method  of  claim  1,  wherein  the  random  number  stream  provided  for  at  least 
one  of  the  program  contexts  is  based  in  part  on  a  user-supplied  stream  identifier  for  program 
context. 

9.  (Original):  The  method  of  claim  1 ,  wherein  providing  the  random  number  stream  to  the 
process  for  each  of  at  least  one  of  the  two  or  more  program  contexts  comprises  receiving  a 
stream  identifier  stored  in,  or  generated  from,  a  library  module. 

10.  (Original):  The  method  of  claim  1,  wherein  the  random  number  stream  provided  for  at  least 
one  of  the  program  contexts  is  based  in  part  on  an  iteration  number. 
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11.  (Original):  The  method  of  claim  1,  wherein  the  random  number  stream  provided  for  at  least 
one  of  the  program  contexts  is  based  in  part  on  a  user-specified  seed  value. 

12.  (Original):  The  method  of  claim  1,  wherein  the  process  is  a  dynamically  spawned  process, 
wherein  a  random  number  stream  allocated  to  it  is  based  in  part  on  unused  initialization 
sequences  from  the  random  number  streams  originally  allocated  to  the  parent  process  from 
which  the  process  was  spawned. 

13.  (Currently  Amended):  A  system,  comprising: 

a  processor; 

a  memory  coupled  to  the  processor,  wherein  the  memory  comprises  program  instructions 
executable  by  the  processor  to  implement: 

determining,  using  the  processor,  a  plurality  of  program  contexts  within  a  process, 

wherein  each  program  context  comprises  calls  for  one  or  more  random  numbers; 
and 

providing  automatically,  for  each  program  context,  a  distinct  random  number  stream, 
wherein  the  random  number  stream  provided  for  one  of  the  program  contexts  is 
based  at  least  in  part  on  the  detennined  program  context,  and  wherein  the  random 
number  stream  provided  for  one  of  the  program  contexts  is  distinct  from  the 
random  number  stream  for  at  least  one  other  of  the  program  contexts. 

14.  (Original):  The  system  of  claim  13,  further  comprising: 

a  network  of  systems  in  which  one  or  more  systems  may  store  portions  or  all  of  code  and 
data  needed  for  CPRNG  and  compute  or  provide  instructions  or  data  needed  to 
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use  CPRNG  or  the  random  numbers  to  at  least  one  or  more  other  systems  by  way 
of  the  computer  network. 

15.  (Original):  The  system  of  claim  13,  wherein  each  of  the  program  contexts  is  provided  a 
random  number  stream  that  is  distinct  from  the  random  number  stream  for  any  of  the  other 
program  contexts  in  the  process. 

16.  (Currently  amended):  A  non-transitory,  computer-readable  storage  medium  comprising 
program  instructions  stored  thereon,  wherein  the  program  instructions  are  configured  to 
implement: 

determining  one  or  more  program  contexts  within  a  process,  wherein  at  least  one  of  the 
one  or  more  program  contexts  comprises  code  that  calls  for  one  or  more  random 
numbers;  and 

providing  automatically,  for  each  of  at  least  one  of  the  program  contexts,  a  random 

number  stream  to  the  process,  wherein  the  random  number  stream  provided  for  at 
least  one  of  the  program  contexts  is  based  at  least  in  part  on  the  determined 
program  context,  and  wherein  the  random  number  stream  provided  for  at  least  one 
of  the  program  contexts  is  distinct  from  the  random  number  stream  for  at  least 
one  other  of  the  program  contexts. 

17.  (Original):  The  computer-readable  storage  medium  of  claim  16,  wherein  the  program 
instructions  further  comprise: 

CPRNG  code  and  data  in  the  storage  medium  of  one  computer  accessed  by  way  of  a 

computer  network  by  another  computer  to  initialize  and  maintain  random  number 
streams  and  generate  random  numbers. 
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18.  (Original):  The  computer-readable  storage  medium  of  claim  16,  wherein  each  of  the 
program  contexts  is  provided  a  random  number  stream  that  is  distinct  from  the  random  number 
stream  for  any  of  the  other  program  contexts  in  the  process. 


19-29.  (Canceled) 
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ABSTRACT 


A  method  of  providing  random  number  streams  to  a  process  includes  determining 
two  or  more  program  contexts  within  a  process.  Each  of  the  program  contexts  may 
include  code  that  calls  for  one  or  more  random  numbers.  For  each  of  at  least  two  of  the 
program  contexts,  a  random  number  stream  is  provided  to  the  process.  The  random 
number  stream  for  each  program  context  is  based  on  the  determined  program  context  and 
is  distinct  from  the  random  number  stream  for  the  other  program  contexts  in  the  process. 
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